Mining Twitter Data with Excel and other tips for Twitter Analytics

April 23rd, 2012
by Tim Resnik

We work with several publishing clients that manage a multitude of Twitter accounts. We frequently run across the problem of spending too much time compiling a large cross section of key Twitter stats for multiple accounts. For example, we have a client that has over 20 Twitter accounts for unique brands unique brands. Trying to collect and analyze follower and status update counts is a tedious manual process, and not recommended.

There are two methods to accomplish data scraping from Twitter:

1)      Develop a web application using the Twitter API

2)      Use a mashup of tools mixed with Excel or Google Docs.

Building a web application to perform this scraping function has obvious financial and time barriers, but may be worth the effort or licensing fee in the long run.

Using Google Docs and ImportXML can handle this function nicely and has been covered thoroughly (I mean thoroughly) by Distilled in their ImportXML guide . Also, John Doherty makes good use of the API and Google Docs and has created a link prospecting tool.  What I want to cover in this blog is how to collect basic Twitter following/follower stats across several accounts using Excel and Niels Bosma’s badass plugin: SEO Tools for Excel. Since the SEO Tools plugin is doing all the heavy lifting in this example, Niels should really get all the credit.

Step 1: Install SEO Tools for Excel

There are a lot of other sweet features, but for this example we are just going to use the scraping function:

 SEO Tools for Excel

Step 2: Build a list of Twitter handles that you would like to collect via the Twitter API

First you need to figure out who you are interested in grabbing the data for. This could be your competition, your employees, different products within your company, punk bands, or whatever group of Tweeters you want to analyze. You can use services like  WeFollow and Listorious to generate ideas. Compile the list and put the handles in the first column of your spreadsheet (don’t include a preceding @). With this particular XML feed here are some of the relevant things you can pull:

  • ID: the numerical unique identifier for the account. This is a handy key to have because it will never change while the screen name can be changed by the owner of the account
  • Name: the name of the person who registered the account
  • Location: the geo-location of where the account was created
  • Description: the user-created description for the account that shows up right under the screen name on Twitter.
  • Profile img URL: the location of the profile image for that account
  • Followers count: number of followers that this account has
  • Friends count: number of accounts that this account is following
  • Created at: the date and time that this account was created
  • Status count: number of tweets since the account was created
  • Listed count: the number of times that this account has been included in another accounts lists
Or, checkout all the gory details

Here is what my column headers look like:

 Excel Column Headers

You can download the example XLS file here. The formulas are all pre-populated so all you need to do is enter the Twitter names (30 max due to API rate limiting) that you want the information for in the “Twitter Handle” column. If you want to build it out yourself, below are the details.

Step 3: Construct the API Request URL (sheet 2)

We must create the proper URL to be able to pull this information from Twitter. In this case we are using the Twitter REST API resource: GET users/show. This is one of a handful of API calls that does not require user authentication.

There are two required components and one optional to the URL.

Required:

  • Request URL: https://api.twitter.com/1/users/show.xml?screen_name=

+

  • Screen name: the Twitter handle you would like information for

So, it will end up looking like this:

https://api.twitter.com/1/users/show.xml?screen_name=mattcutts

Optional: include entities. This API requests returns the latest tweet for the user info requested. If you would like to include information about the tweet such as user mentions, hashtags or associated URLs, then also include the following to the end of the URL: &include_entities=true. The final URL would look like this:

https://api.twitter.com/1/users/show.xml?screen_name=mattcutts&include_entities=true.

We now have to construct the URL. In sheet 2 of the spreadsheet you can see that I am simply concatenating two fields: the Twitter handle from sheet 1 and the URL from the current sheet.

Step 4: Write Your Formulas and Populate the Array

Nothing too fancy here thanks to SEO Tools for Excel. There is a pre-built function that does all the heavy lifting. You can select the XPath (reads XML) or JSON option with the Twitter API. In this example we will use XML.

The XPath function has two inputs: the URL to call the XML file, which we constructed above and is in sheet 2, and the instructions for selecting the right information in the file (the proper node). Here is a very basic write up of how to use XPath to select nodes within an XML file.

 

Once you have your formula written for each column, fill in the rest of your array and wait a bit as it fills in the data. The Twitter REST API is limited to 150 requests per hour and each cell in the array you have just created is an API call. In this example we have 5 columns and 25 Twitter handles so filling out this array once will be just 25 short of the hourly cap.

Step 5: Some Basic Analysis

Now that we have an array of data for a group of Twitter users we can do some very basic analysis. In follow up blogs, I will collect the data over time and do a little deeper look at trending analysis.

In sheet 3 you will find a few graphs based on this data. It is important to note that the data in sheet 1 is formatted as text because Excel is keying off of the formula. This makes analysis difficult on that tab itself. I copied and pasted the values to sheet 3(Copy>Paste Special>Values) and then converted the text to numbers.

I wanted to compare the number of followers juxtaposed with the average number of tweets per day. Twitter returns the date as (Day of week, Month, Day, Time Stamp and Year). In order to format the date so Excel can understand it you must pull the right information out of the ‘Date Created’ :

Format Dates in Excel

Excel still can’t understand it because it is reading the formula instead of what is being visually shown in the cell. In order to get it to work, use Copy>Paste Special>Values to overwrite the formula, go to Format Cells>Date> format as 04/23/2012.Create a new column called Avg. Tweets per day. You now need to figure out how many days have passed from the time the account was created and divide into that the total number of tweets.

Average Tweets per Day

Next, you can select the data in the Followers and Avg. Tweets per Day columns to generate your graph. I won’t go over the details on chart formatting here, but here is the end result:

followers to tweets graph

To be continued…

 

Obama Tries for Geek Cred with ASCII Art

March 9th, 2012
by Tim Resnik

It’s not exactly like he came up with the idea, but at least he didn’t stop it.

If you view the source of the  barackobama.com you will see the following piece of ASCII art commented at the top of the HTML doc.

Obama ASCII art

 

The technological difference between Obama’s and Mitt Romney’s website is stark. Some may even say symbolic, but that’s not for this type of blog. Barry’s web team is using HTML5  and has enough time (and money) on hand to make ASCII art. If you jump over to Mitt Romney’s site and see what is going on, you’ll notice that it is coded up in a more antiquated version of HTML( XHTML 1.0) – not quite  cutting edge. It also uses the free, open-source content management system Drupal (great product by the way), which if nothing else is financially inline with fiscal conservative values — perhaps Newt should have taken a page out of this book.

 

On a separate but somewhat-related note: In the 2008 election we some the Barack Obama campaign leverage social media very effectively to build a grass-roots campaign that raised a record amount of money. I predict in this election we will see the Obama campaign innovate using geo-social/mobile applications. After all, he did join Foursquare a few months back.

How to implement Rel=”publisher” and Musings on the Authorship Markup Landscape

February 8th, 2012
by Tim Resnik

Shortly after the Google+ beta launch in July of 2011, Google began promoting authorship markup to webmasters, publishers, and bloggers. The markup enables Google to semantically build connections between disparate pieces of content and the individuals who wrote them (who have a Google+ profile). You might be saying, well that’s all and good for Google, but what do I get out of it? Google’s answer to that question today would be: you *may* receive authorship information along with your listings in the search results, such as a headshot, rich snippets from your Google+ profile, and even your own author SERP (such as Bianca’s below).  Their likely answer tomorrow: it will be used as a key cog in determining “Author Rank” that will greatly influence rankings and the SERP landscape. (A nice piece by John Doherty discussing Author Rank.)

Example of SERP when Authorship Markup is Implemented Properly

Now to the three authorship tags:

      1. Rel=”author”: a link, usually from the byline, from a piece of content created by an author to an author’s profile page.
      2. Rel=”me”: a link from the author’s profile page to the author’s Google+ profile. A reciprocal link from the author’s Google+ profile, under “Contributor to”
      3. Rel=”publisher”: a link in the head of the webpage to an organization’s Google+ Page.

I’m going to focus on the implementation of the publisher tag in this blog. To learn more about the other two check out AJ Kohn’s very thorough write-up on implementation steps, or check out these other resources: Google’s official guidelines (recently made a lot easier by allowing an email address verification from G+ to be used in place of rel=”me”), WordPress implementation, Matt Cutts YouTube video explaining authorship markup.

We know the value of the “author” and “me” markup, but what is the value of the rel=”publisher” tag? Again, the answer today may be a little different than the answer tomorrow. Today, it makes your site *eligible* for Google Direct Connect  which is a navigational search using the “+”<organization name> that sends the searcher directly to your Google+ Page. For example, if you do a search for +Pepsi instead of seeing a search result you will be directly navigated to Pepsi’s Google+ Page. At this point eligibility is determined algorithmically by Google on relevance and popularity. If you don’t think you qualify, you probably shouldn’t implement it at this point.  I have recently seen several branded SERPs that include Google+ page information right below the site links. I am not sure if this is a direct result of the rel=”publisher” verification or some other algorithm. NYTimes.com has it, yet CNN does not. Neither of which have rel=”publisher” implemented:

Google+ Page Showing up in Publisher SERP

 Implementing rel=”publisher” is not exactly a tough coding job, but there are a few quirks and incongruities.

The first step is to determine if you need the rel=”publisher” tag. If you have a high traffic content-rich website AND a Google+ Page for your business (not to be confused with a personal page on Google+), then rel=”publisher” is the markup that you want to use to let Google know that your site LOLcorp.com owns the Google+ page LOL Corp.

Next, add the rel=”publisher” tag to the <head> of your homepage. Google has a tool where you can generate the code and a Google+ badge for your site. This is the step where the waters get a little murky for me, and perhaps ESPN, but I’ll get to that in a minute.

At AudienceWise we work with clients that have many sub-brands, sometimes on a single domain. The instructions from Google are to put the code in the <head> of the document of your “main page.” However, they have not been clear about using multiple rel=”publisher” tags on a single domain. I have scoured the Google forums, as well as reached out directly to a few folks, but to no avail. No one seems to know for sure.

Undeterred, I looked around to try to find an analogous situation and came across the ESPN implementation. As far as I can tell, ESPN has two verified Google+ pages: NBA on ESPN and ESPN. I first checked ESPN for the rel=”publisher” tag and did not find it. I was then a little surprised to find it on the NBA page, but noticed that it was in the body and not the head. ESPN even left Google’s commented out instructions:

It’s not surprising that ESPN NBA, a site that should be eligible for Direct Connect, is not triggering direct navigation to their Google+ Page.

Once you have figured out the right place to put the tag, you can optionally put the G+ badge anywhere in your document. Next, make the connection from your Google+ page to your webpage. Simply select ‘edit’ and navigate to the ‘about’ tab and add your website. Make sure to use the canonical version of the url, or it won’t work. For example, www.pepsi.com is the canonical location of the website, not pepsi.com or subdomain.pepsi.com.

You should be ready to test at this point. Jump over to the Google Rich Snippet testing tool  and see if Google likes you or not. If you have already implemented your rel=”author” and rel=”me” tags, and they exist on the same page as your rel=”publisher”, tag you will get the warning below. However, Google has confirmed that this is just a bug and you can indeed have both the tags on the same page. In fact, Mashable receives this error in the testing tool (but Direct Connect works) so obviously this is not a problem. 

 At a glance, authorship markup seems a bit insignificant in the grand scheme of Google changes in the last year: Search+, the freshness algo , “secure search”, continued Pandalties, a massive privacy policy overhaul and Google+ Pages for businesses. However, a time will come where these tags (and other microformats) will become increasingly important in rankings and SERP display so it will likely pay off to be ahead of the curve and get it done now. Hopefully Google will provide clearer implementation guidelines, testing tools and equal inclusion for the publishers in the middle class.

Interview with Matthew Brown by PPC Associates

November 28th, 2011
by Tim Resnik

PPC Associates posted an interview with AudienceWise co-founder Matthew Brown about the current state of SEO and inbound marketing. 

Questions include:

1)       Please tell me your background and what you do for a living.

2)       Please complete the phrase “PPC to SEO as _____ is to ______.” (And please explain your answer.)

3)       If businesses are raking in money via paid search, why should they care about SEO?

4)      Many objections to SEO revolve around the indefinite, unpredictable nature of the results (which contrasts to the highly precise ROI from PPC). How would you answer that?

5)      How can an SEO client determine whether a prospective SEO provider is knowledgeable and capable of achieving excellent results for them?

6)      What is a typical SEO engagement for you?

 

 

 

 

How to Setup Google+ Pages [Guide]

November 10th, 2011
by Tim Resnik

After a few days of playing with the new Google + Pages, I am cautiously optimistic that it will provide some unique opportunities over Facebook and Twitter, specifically with Hangouts. However, I am a bit disappointed at other limitations, such as Google’s policy not to allow contests and promotions.

Here are the basic steps to getting started with Google+ Pages:

Step 1: Login to Google+ and Go here. If you want to see their promo splash page, go here instead.

Step 2: Pick a category and add your info. If you have a physical location and a Google Place page you should select “local business or place” because it will integrate some of your Places information to your Page. Google allows you to set up both a local business and a brand, but don’t do this unless you want to dilute presence on Plus and manage multiple accounts.

Google+ Pages - Create a Page

Step 3: Customize your page’s public profile. Not much you can do here, but add a short tagline and a profile photo. Don’t expect to do skyscraper image here. Unlike Facebook, the image size is fixed at 200X200 pixels. One interesting feature here is a fairly sophisticated photo editing option during the upload process.

Google+ Photo Editing

Step 4: Get the word out. Google encourages you to do four things to build your business’s Google+ presence: start posting, build your circles – in other words start adding a bunch of people to your circles, link directly to your profile and connect your website to your Google+ Page. This last piece is probably the most interesting because of the use of the rel=”publisher” link, which will “confirm ownership of a site” for Google. Google requests that you put this link which points to your + page from your website in the section of your site. 

Now it’s time to start posting, interacting and building a following. Here a few important things to note:

When you go to your Google+ home page you will notice a drop down to the right of your profile image and directly below your name. This is where you can toggle between your personal profile and your pages. Be careful. Once you switch to a page your all of your posts and comments will be from that page. Moral of the story: before posting pictures from your recent trip to Burning Man, make sure you are on the right profile/page.

Unlike your personal Google+ profile, your Page cannot add individuals to Circles if your Page is not in their Circles. Pages can however add other Pages to their Circles without reciprocation.

Pages do not appear to have the capability to be managed by multiple accounts. It took Facebook a few years to add this functionality, and I am not entirely surprised that it is not in the first release of Google+ Pages.

Google+ Business Pages: the hesitation from Googleplex (a theory)

October 21st, 2011
by Tim Resnik

Google Plus has now been in public Beta for over 3 months and business pages are nowhere to be found. In the first few weeks of launch, many businesses created profile pages only to see them taken down by Google. Google then came out and publicly said that pages for businesses would be launched later in the year and that there would be a test group for brands. The only official peep that has come from the Googleplex since has been the announcement that businesses could be represented by a living, breathing person. What’s the holdup? Why is Google dragging its feet? Is it to protect the user experience, or are they protecting something else?

There has been speculation that business profiles are currently being tested privately. As evidence, you see company logos coming up when you edit your personal employment history. The small piece of evidence that I have to the contrary is that one of our clients comes up, and I know for a fact that they have nothing to do with the test group for business pages. Although they did apply so it could have something to do with that.

So, the question is: why is Google being so hesitant on allowing businesses to create pages? Is it really because they are building a business experience that far exceeds the consumer profiles in terms of usefulness as Christian Oestlien claims in the video above, or is it something more?

Something that I have talked about in the past is that one of the advantages of Facebook ads over Adwords is that if you convert clicks to Page “likes”, you have created a permanent connection between you and that target, giving the advertiser the ability to connect with the target over and over. Adwords does not share this permanency (unless you point it to your Facebook page) and makes a decent percentage of its Adwords revenue from searchers who have clicked on a brand’s ad more than once. This relationship with clicks, follows, and likes leads me to believe that Google is being protective of their Adwords revenue by not allowing businesses to participate in Google+ until they devise and fully test a model that not only protects this revenue, but increases it.

The Google Plus Killer Feature – Search (or at least it could be)

September 20th, 2011
by Tim Resnik

How do you find people, businesses or topics on social networks? The logical thing to do is use the built in search tools for the  site you’re on. If only it were so easy. Seemingly of late I find myself going to Google and doing a “site:” operator to search Twitter, Linkedin, and probably the worst sinner of all, Facebook. On the other hand, not surprisingly, Google has used its bread and butter skills of indexation and display to make the Google+ search experience far more robust than its social competitors.

Let’s look at a really simple example. A lot of people like Coca-Cola. In fact they are one of the most recognized and valuable brands in the world . Based on that fact, we are going to make the assumption that if someone is searching for the common alias “coke” in a search box, they are generally looking for the parent brand Coca-Cola (unless it happens to be someone that really enjoyed the 80s; people usually infer that if they don’t refine their search they are going to be returned the brand, not the illicit substance). If you perform a basic search for “coke” in Facebook, you get listings that are categorized by Pages, Groups, Apps and People. The top Page listing is an exact title match of a page that has nearly a million ‘likes’, but the Coca-Cola page is nowhere to be found.

Facebook Search

Should Coca-Cola, which doesn’t have an exact match in the Page title, come up before Coke Studio? Google sure thinks so, and I am willing to bet that the 98+% of the people searching for “coke” on Facebook are looking for the official company page. The conclusion is not surprising: Google’s search algorithm appears far more sophisticated than Facebook’s. After all, it is what the empire is built on.

 

Facebook Google Search

 

The adoption of Google+ has been significant, accruing over 25 million users in the last few months (in invite only mode, which was lifted today), however, the usage and sharing has appeared to drop off a cliff (I have no explicit evidence of this, but rather stating a common sentiment within folks in online marketing circles). For Google+ to be a formidable competitor to Facebook, they need to leverage what they are really good at: discovery.

The screen shot below shows the exact same search that I did in Facebook. The test is far from scientific and is not even comparing apples to apples since G+ doesn’t allow business profiles. However, two things are clear: First, the results more closely match my intent, and I assume that once G+ allows business pages that the Coca-Cola business page would be in place of the trucker hat chick. Second, the results page itself has superior organization and provides blended results by default and filters for people, posts and Sparks separately.

Gooogle Plus Search

As Google+ grows and continues to innovate, Facebook will surely be forced to “innovate” here (and vice versa, of course). See: Facebook announcing asynchronous relationships. For G+ to finally throw its full weight into the social media arena, it must leverage what its empire is built on as the killer competitive advantage.

New Google Attributes for Pagination and “View All”

September 15th, 2011
by Tim Resnik

Ever since Google released the rel=”canonical” attribute for publishers to deal with duplicate content, webmasters have been misusing it or implementing it poorly. One tactic that the attribute has been used for in the past is for content with pagination. The reason for a webmaster to do this is simple: it lets Google know which page in a series is most important and the canonical version of the page will accumulate the link equity of the sum of the parts (minus a little something). Google will only include the canonical version of the page in the index and that page will be more likely to rank higher.  It’s great in theory and we’ve seen it work well.

The only problem is that Uncle Google frowns upon this use of the attribute because it is really only intended to help Google keep duplicate content out of the index and allow publishers to consolidate indexing properties, such as link equity. The misuse of rel=“canonical” as a pagination attribute tells Google that page 2 content is the same as the page 1 content when in fact it is not.

After many requests from SEOs and webmasters, Google has finally released attributes that can be used specifically for pagination. Here is a breakdown. For the details, check out the Google Webmaster Central blog: Pagination with rel= “next” and rel=“prev”.

There are two major elements to the rules around the new attributes, and publishers need to determine which one they fall in:

First, for content or product categories that are paginated that DO NOT have a “view all” option, rel=“next” and rel=“prev” can be used to “to indicate the relationship between component URLs in a paginated series”.

For pages that have a “view all” option, Google says “We aim to detect the view-all version of your content and, if available, its associated component pages. There’s nothing more you need to do!“ However, if you are not satisfied with letting Google “figure it out”, you can use the rel=”canonical” attribute. So, the question is, why is it acceptable use of rel= “canonical” on view all pages and not pagination? Because the paginated content is a subset of the “view all” content and is therefore duplicative.

 

Google + Shares Showing up in Search Results, are mis-labeled

July 13th, 2011
by Tim Resnik

I was doing a quick search for keyword tools on Google today and noticed that +1′s from people in my Google + network were showing up under the search results. I also found that Google has mis-labeled Google + as Google +1 in the results. Branding hiccup or bug?

Google Places to Portlanders: Please Create Garbage Reviews!

June 13th, 2011
by Tim Resnik

Google continues to push Places and Offers very aggressively in the Portland area. When Google first released HotPot in Portland in December, Matt McGee  wrote about how they broke their own guidelines by offering over $13,000 in free dining to those who posted the most reviews.

Well, it appears they are at again. This time it’s a promotion centered around the Google Offers for Ground Kontrol and the only place it is being promoted is on a Facebook Events page. Interesting.   This promo incentivizes folks to go review mad. 50 reviews gets you an invite to a special event at Ground Kontrol, 100 gets you the invite plus a drink and a unfathomable 200 reviews gets you 2 free drinks, some grub, a t-shirt and a $15 gift certificate. If you have enough time to write 200 reviews for some SWAG and a couple of drinks, you either need to look for a job or move out of your parents’ house.

 

Google’s current policy:

Conflict of interest
Reviews are only valuable when they are honest and unbiased. Even if well-intentioned, a conflict of interest can undermine the trust in a review. For instance, do not offer or accept money or product to write positive reviews about a business, or to write negative reviews about a competitor. Please also do not post reviews on behalf of others or misrepresent your identity or affiliation with the place you are reviewing.

At the time of Matt McGee’s article the guidelines had the following sentence: “In addition, we do not accept reviews written for money or other incentives.” That sentence is now removed.

I understand that Google is trying to prime the pump by injecting the system with some reviews, but I question whether a synthetic strategy versus an organic one is wise. Google built a search empire on the foundation of quality and relevancy and shouldn’t forget its roots as it forges ahead in social.

P.S.  add a comment to this blog and be automatically entered to win a one-of-a-kind AudienceWise trucker hat (comment bots not eligible)

Next Page »