Researching the Twitter data feed

A new book by UCLA professor Zachary Steinert-Threkeld called Twitter as Data is available online free for a limited time, and I recommend you download a copy now. While written mainly for academic social scientists and other researchers, it has a great utility in other situations.

Zachary has been working with analyzing Twitter data streams for several years, and basically taught himself how to program enough code in Python and R to be dangerous. The book assumes a novice programmer, and provides the code samples you need to get started with your own analysis.

Why Twitter? Mainly because it is so transparent. Anyone can figure out who follows whom, and easily drill down to immediately see who are these followers, and how often they actually use Twitter themselves. Most Twitter users by default have open accounts, and want people to engage them in public. Contrast that with Facebook, where the situation is the exact opposite and thus much harder to access.

To make matters easier, Twitter data comes packaged in three different APIs, streaming, search and REST. The streaming API provides data in near-real-time and is the best way to get data on what is currently trending in different parts of the world. The downside is that you could be picking a particularly dull moment in time when nothing much is happening. The streaming API is limited to just one percent of all tweets: you can filter and focus on a particular collection, such as all tweets from one country, but still you only get one percent.That works out to about five million tweets daily.

Many researchers run multiple queries so they can collect more data, and several have published interesting data sets that are available to the public. And there is this map that shows patterns of communication across the globe over an entire day.

The REST API has limits on how often you can collect and how far back in time you can go, but isn’t limited to the real-time feed.

Interesting things happen when you go deep into the data. Zachary first started with his Twitter analysis, he found for example a large body of basketball-related tweets from Cameroon, and upon further analysis linked them to a popular basketball player (Joel Embiid) who was from that country and lot of hometown fans across the ocean. He also found lots of tweets from the Philippines in Tagalog were being miscataloged as an unknown language. When countries censor Twitter, that shows up in the real-time feed too. Now that he is an experienced Twitter researcher, he focuses his study on smaller Twitterati: studying the celebrities or those with massive Twitter audiences isn’t really very useful. The smaller collections are more focused and easier to spot trends.

So take a look at Zachary’s book and see what insights you can gain into your particular markets and customers. It won’t cost you much money and could payoff in terms of valuable information.

 

 

One thought on “Researching the Twitter data feed

  1. Pingback: FIR B2B Podcast #89: Fake Followers and Real Influence | Web Informant

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.