In preparation for a presentation on SEO Tools at SES Chicago this year, I’ve been rounding up some sites I find most useful for SEO. Google Analytics tops the list. Since the meat of my speech will be explaining how to use Yahoo! Pipes to glue SEO information together, I need my tools to have APIs. Google Analytics doesn’t yet.
After a little bit of searching, I found someone else’s novel approach to putting an API on Analytics. It involves scheduling an XML report to be sent to a Google Groups page and then accessing that report through Yahoo! Pipes. I don’t know if Google saw this guy’s article and intentionally changed how Groups handles attachments or, more likely, it was just changed in a recent release for other reasons, but part of his recipe no longer works. Scheduling the report and setting up the group are still prerequisites and I’ll let him explain:
Setting up the Google Group
Since Google Analytics doesn’t provide an API, or allow you to link directly to any exported reports, we’ll use a Google Group to host the files which we’ll schedule Google Analytics to email to us. When you setup your Google Group, choose the Announcement-only option. Once created, under the Group settings menu item, select Access and make sure that Anybody can view group content, Do not list this group and People have to be invited are all selected. This is so that no one else can post to the group, which would cause issues when trying to retrieve the Analytics message. Keeping the group unlisted makes it less likely for someone to stumble across your Analytics reports when searching Google Groups. Although it would be preferential to make the group private, this would prevent public access to the feeds for the group, which we’ll need later.
While we could email our reports directly to the Google Groups email address, each message would then contain an “opt-out” link because it’s not the email address we’ve got registered with Google Analytics. Given that our messages will be publicly available, we’ll be using Gmail to forward the messages from the same Gmail address we use for Google Accounts so that if anyone manages to find the Google Group, they can’t stop our scheduled report. Simply create a new filter, looking for any email with Analytics in the subject that has attachments and have Gmail forward the email to your Google Group. (You can choose to “skip the inbox” so you don’t have automated reports cluttering up your inbox too.)
Setting up Google Analytics
In Google Analytics, under the Content section, view the Top Content report and change Show rows from 10 to 50. (You can’t configure how many results to include in your report any other way; it just remembers the last setting you selected.) Now click the Email link button near the top of the page, beneath the page title. Select the Schedule tab, change the report format to XML, set the date range/schedule to Monthly (unless you have a really active blog, then you might want to keep it on Weekly) and click the Schedule button at the bottom. Just to test everything, select the Send Now tab, choose XML as the format and click the Send button.
If everything worked correctly, after a few seconds your Google Group should have a Top Content XML report in it! :o)
I’ve created a Yahoo! Pipe that takes the base address of your group, finds the latest attachment, and returns the XML. If you want, you could actually just stop reading here, clone my pipe and start using it immediately. If you’d like a little more explanation, however, I’ll oblige.
You can check out the full pipe but I’ll go over each part individually. Quick nomenclature note; I’ll be calling each of the little boxes a “process.” Unix habits die hard.
1) Group Page (user input):
This allows any user of the pipe to input a custom page. This process just passes that user input into the pipe.

2) Feed Auto-Discovery:
If the page includes a data feed, either RSS or Atom, this process will return it along with it’s address.

3) Filter:
Pretty self explanatory, the process filters any data coming into it. In this case, it returns only the RSS feed. I doubt it really matters, but I had to pick one.

4) Fetch Feed Loop:
The data after the filter only includes one item, but I still have to loop over it to get to the internal data. In this case, I want to get the RSS feed address and return the actual feed.

5) Truncate:
Takes a list of items and returns how ever many you want. In this case, I just want to the first, and latest, item in the feed. This translates into the latest post in the group

6) Links Loop:
“Links” is a process I created. It returns all href links on an HTML page. Specifically, it’s returning all links in the first post of your group.

7) Filter:
This second filter will take the list of links and only return those containing the characters “/attach/” and “.xml”. That should give us the link to the XML attachment in the post.

8) Truncate:
The truncate here is purely a precaution. It’s unlikely the filter will return any more than one link, but if you’ve accidentally sent two reports, this will grab the first one only.
9) Links Loop:
Oh, you thought that opening the XML attachment’s link would give you an XML attachment? Oh dear heart, I’ll forgive you for your naivete. Instead, Google sends you to another page with the “real” link. This process grabs that link for you.
10) Regex:
Regex is short for “regular expression.” They’re a very powerful way to find and replace text. So powerful, in fact, that a full explanation is beyond this post. Entire books have been written on this topic, the best by Jeffrey Friedl. This regex is very simple, it just deletes any part of the link that looks like this “amp;”. The reason is that Google returns the attachment link as encoded HTML. In HTML, “&” are encoded into “&”. I need to switch it back to create the correct attachment URL.

Et Voila! A fully formed URL for retrieving your XML attachment from Google Groups. Once you’ve set up Analytics to email the report to your Google Groups page, just plop this pipe into the beginning of your own pipe and you’ll be able to access the correct XML.