I’m a huge fan of automation when the scenario allows for it. Maybe you need to keep track of guest information when they RSVP to your event, or maybe you need to monitor and react to feeds of data. These are two of many possible scenarios where you probably wouldn’t want to do things manually.
There are quite a few tools that are designed to automate your life. Some of the popular tools include IFTTT, Zapier, and Automate. The idea behind these services is that given a trigger, you can do a series of events.
In this tutorial, we’re going to see how to collect Twitter data with Zapier, store it in MongoDB using a Realm webhook function, and then run aggregations on it using the MongoDB query language (MQL).
There are a few requirements that must be met prior to starting this tutorial:
There is a Zapier free tier, but because we plan to use webhooks, which are premium in Zapier, a paid account is necessary. To consume data from Twitter in Zapier, a Twitter account is necessary, even if we plan to consume data that isn’t related to our account. This data will be stored in MongoDB, so a cluster with properly configured IP access and user permissions is required.
You can get started with MongoDB Atlas by launching a free M0 cluster, no credit card required.
While not necessary to create a database and collection prior to use, we’ll be using a zapier database and a tweets collection throughout the scope of this tutorial.
Since the plan is to store tweets from Twitter within MongoDB and then create queries to make sense of it, we should probably get an understanding of the data prior to trying to work with it.
We’ll be using the “Search Mention” functionality within Zapier for Twitter. Essentially, it allows us to provide a Twitter query and trigger an automation when the data is found. More on that soon.
As a result, we’ll end up with the following raw data:
{
"created_at": "Tue Feb 02 20:31:58 +0000 2021",
"id": "1356701917603238000",
"id_str": "1356701917603237888",
"full_text": "In case anyone is interested in learning about how to work with streaming data using Node.js, I wrote a tutorial about it on the @MongoDB Developer Hub. https://t.co/Dxt80lD8xj #javascript",
"truncated": false,
"display_text_range": [0, 188],
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
},
"source": "<a href='https://about.twitter.com/products/tweetdeck' rel='nofollow'>TweetDeck</a>",
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"in_reply_to_screen_name": null,
"user": {
"id": "227546834",
"id_str": "227546834",
"name": "Nic Raboy",
"screen_name": "nraboy",
"location": "Tracy, CA",
"description": "Advocate of modern web and mobile development technologies. I write tutorials and speak at events to make app development easier to understand. I work @MongoDB.",
"url": "https://t.co/mRqzaKrmvm",
"entities": {
"url": {
"urls": [
{
"url": "https://t.co/mRqzaKrmvm",
"expanded_url": "https://www.thepolyglotdeveloper.com",
"display_url": "thepolyglotdeveloper.com",
"indices": [0, 23]
}
]
},
"description": {
"urls": ""
}
},
"protected": false,
"followers_count": 4599,
"friends_count": 551,
"listed_count": 265,
"created_at": "Fri Dec 17 03:33:03 +0000 2010",
"favourites_count": 4550,
"verified": false
},
"lang": "en",
"url": "https://twitter.com/227546834/status/1356701917603237888",
"text": "In case anyone is interested in learning about how to work with streaming data using Node.js, I wrote a tutorial about it on the @MongoDB Developer Hub. https://t.co/Dxt80lD8xj #javascript"
}
The data we have access to is probably more than we need. However, it really depends on what you’re interested in. For this example, we’ll be storing the following within MongoDB:
{
"created_at": "Tue Feb 02 20:31:58 +0000 2021",
"user": {
"screen_name": "nraboy",
"location": "Tracy, CA",
"followers_count": 4599,
"friends_count": 551
},
"text": "In case anyone is interested in learning about how to work with streaming data using Node.js, I wrote a tutorial about it on the @MongoDB Developer Hub. https://t.co/Dxt80lD8xj #javascript"
}
Without getting too far ahead of ourselves, our analysis will be based off the followers_count
and the location
of the user. We want to be able to make sense of where our users are and give priority to users that meet a certain followers threshold.
Before we start connecting Zapier and MongoDB, we need to develop the middleware that will be responsible for receiving tweet data from Zapier.
Remember, you’ll need to have a properly configured MongoDB Atlas cluster.
We need to create a Realm application. Within the MongoDB Atlas dashboard, click the Realm tab.
For simplicity, we’re going to want to create a new application. Click the Create a New App button and proceed to fill in the information about your application.
From the Realm Dashboard, click the 3rd Party Services tab.
We’re going to want to create an HTTP service. The name doesn’t matter, but it might make sense to name it Twitter based on what we’re planning to do.
Because we plan to work with tweet data, it makes sense to call our webhook function tweet, but the name doesn’t truly matter.
With the exception of the HTTP Method, the defaults are fine for this webhook. We want the method to be POST because we plan to create data with this particular webhook function. Make note of the Webhook URL because it will be used when we connect Zapier.
The next step is to open the Function Editor so we can add some logic behind this function. Add the following JavaScript code:
exports = function (payload, response) {
const tweet = EJSON.parse(payload.body.text());
const collection = context.services.get("mongodb-atlas").db("zapier").collection("tweets");
return collection.insertOne(tweet);
};
In the above code, we are taking the request payload, getting a handle to the tweets collection within the zapier database, and then doing an insert operation to store the data in the payload.
There are a few things to note in the above code:
When we call our function, a new document should be created within MongoDB.
By default, the function will not deploy when saving. After saving, make sure to review and deploy the changes through the notification at the top of the browser window.
So, we know the data we’ll be working with and we have a MongoDB Realm webhook function that is ready for receiving data. Now, we need to bring everything together with Zapier.
For clarity, new Twitter matches will be our trigger in Zapier, and the webhook function will be our event.
Within Zapier, choose to create a new “Zap,” which is an automation. The trigger needs to be a Search Mention in Twitter, which means that when a new Tweet is detected using a search query, our events happen.
For this example, we’re going to use the following Twitter search query:
url:developer.mongodb.com -filter:retweets filter:safe lang:en -from:mongodb -from:realm
The above query says that we are looking for tweets that include a URL to developer.mongodb.com. The URL doesn’t need to match exactly as long as the domain matches. The query also says that we aren’t interested in retweets. We only want original tweets, they have to be in English, and they have to be detected as safe for work.
In addition to the mentioned search criteria, we are also excluding tweets that originate from one of the MongoDB accounts.
In theory, the above search query could be used to see what people are saying about the MongoDB Developer Hub.
With the trigger in place, we need to identify the next stage of the automation pipeline. The next stage is taking the data from the trigger and sending it to our Realm webhook function.
As the event, make sure to choose Webhooks by Zapier and specify a POST request. From here, you’ll be prompted to enter your Realm webhook URL and the method, which should be POST. Realm is expecting the payload to be JSON, so it is important to select JSON within Zapier.
We have the option to choose which data from the previous automation stage to pass to our webhook. Select the fields you’re interested in and save your automation.
The data I chose to send looks like this:
{
"created_at": "Tue Feb 02 20:31:58 +0000 2021",
"username": "nraboy",
"location": "Tracy, CA",
"follower_count": "4599",
"following_count": "551",
"message": "In case anyone is interested in learning about how to work with streaming data using Node.js, I wrote a tutorial about it on the @MongoDB Developer Hub. https://t.co/Dxt80lD8xj #javascript"
}
The fields do not match the original fields brought in by Twitter. It is because I chose to map them to what made sense for me.
When deploying the Zap, anytime a tweet is found that matches our query, it will be saved into our MongoDB cluster.
With tweet data populating in MongoDB, it’s time to start querying it to make sense of it. In this fictional example, we want to know what people are saying about our Developer Hub and how popular these individuals are.
To do this, we’re going to want to make use of an aggregation pipeline within MongoDB.
Take the following, for example:
[
{
"$addFields": {
"follower_count": {
"$toInt": "$follower_count"
},
"following_count": {
"$toInt": "$following_count"
}
}
}, {
"$match": {
"follower_count": {
"$gt": 1000
}
}
}, {
"$group": {
"_id": {
"location": "$location"
},
"location": {
"$sum": 1
}
}
}
]
There are three stages in the above aggregation pipeline.
We want to understand the follower data for the individual who made the tweet, but that data comes into MongoDB as a string rather than an integer. The first stage of the pipeline takes the follower_count
and following_count
fields and converts them from string to integer. In reality, we are using $addFields
to create new fields, but because they have the same name as existing fields, the existing fields are replaced.
The next stage is where we want to identify people with more than 1,000 followers as a person of interest. While people with fewer followers might be saying great things, in this example, we don’t care.
After we’ve filtered out people by their follower count, we do a group based on their location. It might be valuable for us to know where in the world people are talking about MongoDB. We might want to know where our target audience exists.
The aggregation pipeline we chose to use can be executed with any of the MongoDB drivers, through the MongoDB Atlas dashboard, or through the CLI.
You just saw how to use Zapier with MongoDB to automate certain tasks and store the results as documents within the NoSQL database. In this example, we chose to store Twitter data that matched certain criteria, later to be analyzed with an aggregation pipeline. The automations and analysis options that you can do are quite limitless.
If you enjoyed this tutorial and want to get engaged with more content and like-minded developers, check out the MongoDB Community.
This content first appeared on MongoDB.