Track REST APIs with the Google Analytics Measurement Protocol

Google Analytics got a whole lot more interesting when the Measurement Protocol was introduced. We already knew GA was the industry standard for web analytics but with the Measurement Protocol it has become the analytics platform of anything and everything that can be made digital. With some clever instrumentation, we can now use it to track products through the supply chain or track users interactions in a store. All you need is a way to collect digital data and send HTTP requests to Google Analytics and you can track anything.

I had to try it out for myself. While I could have fitted #rhinopug with a tracking device or instrumented my coffee machine with an Arduino, I took the easier (but equally cool) route to getting data: a Web API. As my proof of concept, I chose to track the SwellPath team’s group chat application called GroupMe.

Google Analytics Measurement Protocol

GA Dashboard Courtesy of Mike Arnesen

Tracking a chat app turned out to be a pretty cool way to walk that physical/digital line. While we are humans working in the same office, its interesting to compare contextual information from what we can see and hear to the very objective measure of communication; desktop and mobile messaging. This concept is similar to other measures of digital communication like Twitter firehose or brand mentions from news API’s. Those are probably much more relevant to, and could actually affect a website’s performance but, let’s be honest, this one’s a lot more fun.

Mapping Data to Google Analytics

Digital messaging is actually pretty appropriate for the Google Analytics reporting interface. The main reason is this: timestamps. We rely heavily on timestamps to analyze everything in Google Analytics which are all time-based hits. We ask Google Analytics how different landing pages perform as seasons change and what time of user’s are most likely to convert (in order to bid intelligently on ads). Likewise, there is also a natural rhythm to work-based communication. Of course, (or hopefully) its pretty quiet on the weekends and generally pretty active as people start each workday.

The other reason that human communication maps to the Google Analytics reporting interface is that message creation is a lot like content consumption. When we really think about what a “hit” schema looks like, it has a few entities what go together something like this:

[actor] did [event] on [location] at [timestamp]

This “hit” schema works equally well for describing message creation as it does content consuming.

With every hit, the [actor] a.k.a. User is assigned some attributes like Device or New/Returning and the [event] a.k.a. Event, Pageview or otherwise, will have attributes like URL and  Page Title for Pageviews or Action and Label in the case of Events. The [location] is an interesting one. For web, its the page that the user is browsing but it’s also the physical location of the user a Lat,Lon pair with appropriate geographic information. The [location] attributes are generally handled by Google Analytics automatically but speaking from experience, the real art of a good collection strategy is mapping the right information to the right attribute of each entity.

To make sense of the idea of mapping information to attributes let’s get back on track and talk about GroupMe. It boils down to this: you have data and you want it to appear in Google Analytics in a way that you can logically sort/filter/analyze it. This is where the mapping comes in.

GroupMe’s API gives you data about a group’s messages like this:

{
  "count": 123,
  "messages": [
    {
      "id": "1234567890",
      "source_guid": "GUID",
      "created_at": 1302623328,
      "user_id": "1234567890",
      "group_id": "1234567890",
      "name": "John",
      "avatar_url": "http://i.groupme.com/123456789",
      "text": "Hello world ☃☃",
      "system": true,
      "favorited_by": [
        "101",
        "66",
        "1234567890"
      ],
      "attachments": [
        {
          "type": "image",
          "url": "http://i.groupme.com/123456789"
        },
        {
          "type": "image",
          "url": "http://i.groupme.com/123456789"
        },
        {
          "type": "location",
          "lat": "40.738206",
          "lng": "-73.993285",
          "name": "GroupMe HQ"
        },
        {
          "type": "split",
          "token": "SPLIT_TOKEN"
        },
        {
          "type": "emoji",
          "placeholder": "☃",
          "charmap": [
            [
              1,
              42
            ],
            [
              2,
              34
            ]
          ]
        }
      ]
    }
  ]
}

If this doesn’t make sense to you, go read up on JSON. But essentially what you get when you ask the GroupMe API for the most recent messages, it returns a list of messages with, among other things, the sender’s name and user ID, the message, the number of likes, and the location. So we have information about each of the “hit” entities. The user, event, place and time are all described. The only thing missing that is critical to web analytics metrics is something similar to Page. For that reason I decided to use Google Analytics Events to describe each GroupMe message. Each hit maps GroupMe data to Google Analytics as follows:

Google Analytics Parameter GroupMe Data / JSON Keys
User ID GroupMe User ID / user_id
Client ID GroupMe Source GUID / source_guid
Custom Dimension (User) GroupMe Username / name
Event Category “GroupMe Chat”
Event Action “Post”
Event Label Truncated Text of Message / text
Event Value Count of Likes / count(favorited_by)
Queue Time Difference between Now and Timestamp /current time – created_at

Then each GroupMe message is sent to Google Analytics on an HTTP request with data mapped to GA parameters as shown above. Collect data for a few days and then it looks like this:

Measurement Protocol Specific Values: Queue Time and Client ID

If you come with a Web analytics frame of mind, there may be two things that are unfamiliar to you: Client ID and Queue Time. These are both a pain to get right but functionally awesome.

The Client ID is something you don’t have to think about for web data collection; it’s automatically collected from a cookie that Google Analytics sets for you. It is very important though. It is the key for differentiating two devices that, by their collectible attributes “look” the same but are not. The CID must follow very specific rules to be valid and lucky for me, GroupMe offers a GUID for each message that fits the specifications.

Queue Time is awesome. This is the single most important factor in getting the time value of at Measurement Protocol “hit” right. It is the delta (a cool way to say difference) between the time that the event occurred and the time that the hit was collected. If you send the hit to Google after the hit took place, Google’s servers calculate the time delta and record the hit at the time that it actually took place.

This was especially important for the method I used to get data from GroupMe and send it to Google Analytics. Because I was only getting the messages from the GroupMe API once an hour. Without the Queue Time, the hit timing would be very low fidelity, with spikes each hour when the data was collected and sent. By calculating the Queue Time when each message was sent, I got accurate timing and didn’t have to worry about burning through API limits or wasting lots of HTTP calls. (Think about it, without Queue Time, your data is only as accurate as the frequency that your hits are sent which was a cron job in this case.)

Google Analytics Measurement Protocol API

Don’t call it a hack. Ok, call it a hack.

Lessons Learned / How I’d Do it Next Time

This ended up working out pretty well thanks to a fair amount of luck and plenty of read the docs, code, debug, repeat. I got lucky when I realized I hadn’t accounted for things like the mandatory Client ID parameter and … the fact that my server doesn’t run Python cron jobs. As a result I ended up writing my first PHP script and here I am sharing 100-some lines of amateur code. But hey, this proof of concept works!

If I were to do this again, I would answer a few questions before I started:

Get to know the API

  • Will the API I want to track give me all the data I need?
  • Are events timestamped or do I have a way to approximate that?
  • How difficult is authentication and how long does it last for?
  • Am I going to operate safely within the API rate limits?
  • What about Terms and Conditions of the API data?

Map the Data to Google Analytics

  • How will I avoid making recording the same hit twice?
  • What type of Google Analytics Hit will I use?
  • How should I map the API’s data to a Google Analytics hit?

Automate!

  • Can I write some code to automate this?

How the Code Works

The code I wrote to automate this is listed below but if you are unfamiliar with PHP or code in general the instructions that are given to the computer are essentially this:

Call the GroupMe API to see if there are any new messages since last time
  If no: stop.
  If yes: continue
Call API to get/make a map of User ID’s to User Names to send with hits
For each message that was returned:
  map it to GA parameters
  send it as an event to GA
  For each like of each message:
    map it to GA parameters
    send it as an event to GA
Write the the most recent message ID to a .txt file (to keep track of what has been sent)

Wait for about an hour and repeat with the next cron job

It was a fun project and luckily a successful proof of concept for tracking non-website data in Google Analytics. If you’re thinking about doing a Measurement Protocol project, leave a comment or tweet me at @realtrevorfaux (don’t worry, I’m not tracking it). If you’re interested in other cool ways to track offline transactions, check out Google Analytics Enhanced Ecommerce, I really look forward to what is to come of the Measurement Protocol with things like IoT. Connect, collect, and analyze all the things!

The PHP code that I used is below. Give me a break, this is the first PHP programming (and maybe last) I’ve ever done.

<?php

// script configuration stuff
$token = "abc123"; // from dev page
$group_id = "1234567";  // from dev page
$memory_file = "last_id.txt";

$UAID = "UA-XXXXXX-XX"; // Google Analytics UA Code
$member_names_map = makeNameMap();

// saved last message id to file
$since_id = file_get_contents("last_id.txt");

// endpoint to get lastest messages
$url = 'https://api.groupme.com/v3/groups/'. $group_id .'/messages?token=' .$token. "&since_id=". $since_id;

// call the groupme api
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$response = curl_exec($ch);
$http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);

// check response code and do the rest if no change.
if ($http_status === 304){
  echo "API RETURNED: ". $http_status ." n";
} else {
  $message_count = $response->count;
  echo "API RETURNED ". $message_count ."MESSAGESn";
  handleMessages($response);
}


function handleMessages ($response_obj){

  $json = json_decode($response_obj);
  $messages = $json->response->messages;
  $timestamp = time();

  foreach ($messages as $message) {

    global $UAID;
    $queue_time = $timestamp - $message->created_at;

    $post_hit_params = array (
      'v'=>1,
      'tid'=>$UAID,
      'uid'=>$message->user_id,
      'cid'=>$message->source_guid,
      't'=>'event',
      'ec'=>"GroupMe Chat",
      'ea'=>"Post",
      'el'=>$message->text,
      'ev'=>count($message->favorited_by),
      'qt'=> $queue_time,
      'cd1'=> $message->name,
      'cd2'=> $message->user_id
    );

    sendGAHit($post_hit_params);

    $favorited_by = $message->favorited_by;

    foreach ($favorited_by as $id) {

      $name = $member_names_map->$id;

      $like_hit_params = array (
        'v'=>1,
        'tid'=>$UAID,
        'uid'=>$id,
        'cid'=>$message->source_guid,
        't'=>'event',
        'ec'=>"GroupMe Chat",
        'ea'=>"Like",
        'el'=>$message->text,
        'ev'=>1,
        'qt'=> $queue_time,
        'cd1'=> $name,
        'cd2'=> $id
      );

      sendGAHit($like_hit_params);
    }
  }

  // get last message/id from this call's messges
  $last_message = current($messages);
  $last_message_id = $last_message->id;
  writeMemoryFile($last_message_id);
}

function sendGAHit ($params){

  $query_string = http_build_query($params);
  $url = "www.google-analytics.com/collect?". $query_string;

  // send hit to GA
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  $response = curl_exec($ch);
  $http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
  curl_close($ch);

  echo "n";
  var_dump($params);
  sleep(1);
}

function writeMemoryFile ($last_message_id){
  global $memory_file;
  // write last ID to file for next time
  $memory = fopen($memory_file, "w");
  fwrite($memory, $last_message_id);
  fclose($memory);

  echo "LAST ID WRITTEN TO FILE: ". $last_message_id ."n";
}

function makeNameMap(){
  global $token;
  global $group_id;
  $url = 'https://api.groupme.com/v3/groups/'. $group_id .'?token='.$token;
  // call the groupme api
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  $response = curl_exec($ch);
  $http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
  curl_close($ch);

  $json = json_decode($response);
  $members = $json->response->members;
  $member_names_map = new stdClass;

  foreach ($members as $member) {

    $user_id = $member->user_id;
    $nickname = $member->nickname;
    $members_names_map->$user_id = $nickname;

  }

  return $members_names_map;
}

?>

 

Leave a Reply

Your email address will not be published. Required fields are marked *