Archive for the ‘Scraping Twitter’ Category
This is a follow up to a post I did a few weeks ago about using Excel to scrape key user data from Twitter. In that post I set up a spreadsheet that was tracking 20 of the most followed people in the SEO industry. After collecting that data on a daily basis over the last few weeks, I am going to use this post to demonstrate some basic analysis that can be done. It is not meant to be an Excel tutorial. For one of those, I recommend you check out Distilled’s guide on Excel for SEO, and of course Richard Baxter has a wealth of Excel tips and tutorials on his blog.
Step 1: Format Your Data
Here’s the spreadsheet that goes along with this post - Twitter Scrape Analysis – Excel
I’ve been collecting Twitter data on these 20 SEOs for about 30 days. That’s about 600 lines of data in my Excel spreadsheet. Pretty modest from a data analysis perspective, but we still need to make sure that things are formatted kindly. There are really only two things we’ll be doing with the data 1) adding a few columns for daily trending, i.e. number of followers gained or lost per day, and 2) creating a pivot table. In order to add the trending we are going to some data sorting, so I suggest that you put your data into a table. (Select all the data and CRTL + L).
Step 2: Basic Analysis Inline with your Data
The spreadsheet of our data is attached, but for quick review, our columns look like this:
There are two columns I added next for trending: one for calculating the daily followers gained or lost and a column to count the number of Tweets broadcast. You could do a similar column for following and listed. Next, I perform a quick and dirty formula to calculate the daily trend. There is probably a more elegant solution, but I found this to work, so I went with it:
- Sort, by handle and then by date (oldest to newest)
- Formula: IF(Table2[[#This Row],[handle]]=A1,Table2[[#This Row],[followers]]-C1,”start”). This function checks to see if the two numbers you are calculating belong to the same handle. Then does basic subtraction. This will give you the difference between the row you are looking at and the day before.
- Clean-up: this is an important step. Copy the entire table and paste values in order to overwrite the formulas with the values. If you don’t, the numbers will change when sorting. Next, sort the follower gain/loss column and delete all the fields that say “start”. We want to make sure we only have number in this column so the pivot tables and graphs translate properly when we are building our visualizations.
Step 3: Visualize
Let’s make the data more meaningful and present that data in a simple way that could bubble up some insights. There are a hundred different tools and visualization methods that you could choose, but I am going to keep it very simple and use a pivot table for this example. The main benefit of a pivot table is to summarize like data points, so in this case we want to see basic trending information by each twitter handle that we are tracking.
In the data tab select the whole table, and under the “insert” ribbon tab click on ‘Pivot’ table. A new Pivot table tab will be generated with the field selector open. In the field selector drag handles into the row labels field and followers. Then, add follower gain/loss and daily tweets to the values field. Make sure to select “value field setting” and then select “average” for each. Excel defaults to SUM, which would provide the sum of all the entries in the data table for the handle.
To make it’s more readable sort by followers by clicking under more sort options next to “Row Labels”. Then select “Descending (Z to A) by:” “Average of Followers”:
Next, add some conditional formatting to each row so some of the outliers pop out. You have to format each row separately or it will use the largest number, in this case dannysullivan’s followers as the baseline for all fields.
The keen observer has probably noticed that using followers gain/loss as a measuring stick to judge one Twitter account from another is not an apples to apples comparison. Followers beget more followers and Danny Sullivan and Matt Cutts have the advantage in this group. What is more telling is to normalize the data and compare the ratio of average follower gained to the sum of total followers. This can be done by selecting a single field of the pivot table under ‘Average of follower gain/loss’ column and then selecting options (in the ribbon)>formulas>calculated fields. Looks something like the below. Simply double-click in ‘follower gain/loss’, divide that by ‘followers’ and hit OK.
Make sure to format the new column into percentage, or it will just round off to zero, and that’s not very useful insight. Your end product should look something like the below and will provide you some insights into which handles are the outliers of your group.
In this case there are a few things that stand out, including the fact that Aaron Wall has been losing followers at the fastest rate. He also has not Tweeted in nearly a month. Is his lack of engagement with his followers causing unfollows? It could be, but correlation is not causation as our friends at SEOmoz remind us. A deeper dive into the data may help us reveal something… Ah-ha! In looking at the raw data we see that aaronwall got dropped by 284 people in an 8 day period (note: he no longer uses this account to Tweet, rather SEObook, but the decrease in followers was dramatically higher over this 8 day period than compared to the other days in this data set). In doing a little digging, a blog posted by AaronWall on SEObook entitled “Educating the Market: Is outing & writing polarizing drivel hate baiting or a service to the community” caused a bit of an uproar. Could the post on SEObook have had a negative impact on the @AaronWall Twitter profile? A similar analysis could be done by a publisher on a journalist who wrote a controversial piece of content. However, to really understand what’s going on we need to explore tone and sentiment. I’ll be mashing this data with open source sentiment analysis tools in future blogs. Stay tuned…