Google Analytics for Developers: Way Beyond the Pageview

Google Analytics is more than just page views. As developers, we can use it to measure how our users actually use our apps and use that data to figure out how to make it better. It’s free and best of all, you can do all of this with less than 3 lines of code.

The presentation is about 45 minutes long and the Q&A session ran 15 minutes.


  • User flow analysis
  • Google Analytics vs Mixpanel
  • Discovering user drop off
  • Analyzing user engagement with event tracking


This presentation was made at PhillyRB on Jan 15, 2013.

Related Links:

  • Gattica : My Ruby wrapper for Google Analytics API
    • Uses Google Analytics API v2.
    • Basic Authentication. (No OAuth support)
    • Simple but kind of old now.
    • I’ve moved over to using my “simple” API wrapper for my needs.
  • Legato : Ruby wrapper for Google Analytics API by Tony Pitale
    • Currently maintained.
    • Uses Google Analytics API v3.
    • OAuth2 support
    • Define a class for creating reports
  • Garb : Ruby wrapper for Google Analytics API by Viget Labs
    • Not maintained?
    • Uses Google Analytics API v3.
    • OAuth2 support?
  • Google Analytics API documentation

My gist fits my use case. It’s small and reduces my dependencies. But, if you’re looking for my recommendation for your projects, I would personally go with Legato.

Continue Reading…

Scraping HTML5 Sites Using Capybara + PhantomJS

When I have to get data from the web and add structure it generally falls into three categories: structured API data, data from a “static” website, and data from a “dynamic” website.

I define “dynamic” website as page that requires execution of JavaScript to get to the data. In other words whatever I need to scrape off the page has been added to the DOM after the page loaded.

The Challenge of Scraping HTML5 sites

For example, to get the circle count from Google+ you have to load the page using a browser. The browser will send AJAX requests to get the data and the count back to the page. If you open up Chrome’s inspector window and enable “Log XMLHttpRequests” you can see everything it’s calling out.

All that really means is that you can’t get those counts without automating a browser. That’s where Capybara and PhantomJS come into play.

Using PhantomJS as a scraper

PhantomJS is a headless browser. That means it’s a full browser that you can programatically control but it doesn’t show anything on the screen. It’s original purpose was to help programmers automate testing websites. For scraping purposes this is perfect but you have to run PhantomJS for every site you want to scrape.

Scaling PhantomJS with Capybara

To scale up PhantomJS for multiple threads I used Capybara. It’s also an automated testing tool that provides easy to use functions that deal with starting and killing processor threads, navigate pages, and parsing HTML.

With multiple instances of PhantomJS I wrote a simple wrapper API that starts up a thread with PhantomJS running, interacts with a website, grabs the information I need, shuts down the thread, and returns the information. Each one of those instances is managed by a job queue to make it painless to manage lots of stuff running in parallel.

The Source Code

The pattern I used is similar to the HTTParty gem. The idea is to create a class that encapsulates a specific job, in this case its scrape Google+ and return a hash with the results.

First, I use a mix-in module that creates a basic DSL for creating wrapper APIs. This provides two common things that I have to use across any classes I write: Start and stop a thread and get the HTML.

Then finally write an encapsulating class for Google+. Create an instance of the class with the ID Google+ number passed in. Wait a few seconds for the page to pull in relevant data then parse the HTML with Nokogiri. From there, we can look for the XPath and get the circle counts.

(Note: This is not my production code.)

Now that we have the code getting circles from Google+ is as simple as calling one line of code:

Performance Considerations When Scaling

I only use this method of scraping only when it’s necessary. I will always prefer an API or regular scraping if those options are available. The performance is much better. If you’re not careful, scaling this can become a big memory hog. But in some cases PhantomJS is the only way to get something done.

It’s a big hammer. Use it sparingly.