Pulling Reddit Metrics into Google Sheets

Reddit is a legend of the Web 2.0 transitionary time period. In that era, it was common to find rough edges around popular sites and features and Easter Eggs that could be described as “features that don’t exactly have any purpose but some people might find it cool.” (Consider Google’s “I’m feeling lucky” feature… This post is about one of those cool features.

Getting JSON from Reddit the Easy Way

My favorite thing about Reddit, by far, is that if you add “.json” to the end of any URL, the website will return a JSON representation of all the data of that URL. I learned this early on when I was learning how to code and I’ve always appreciated how easy it was to get JSON to play around with or metrics to analyze. 

For example, here’s a look at the Google Apps Script Subreddit in my browser.: (https://www.reddit.com/r/GoogleAppsScript/)

Here is what happens if you add “.json” to the end of the URL:

Reddit JSON with .json

You can try it for yourself by going here: https://www.reddit.com/r/GoogleAppsScript/.json

This means, you could if you wanted to, recreate any page on reddit.com using the JSON and your own creative UI. And in our case, our “creative UI” is Google Sheets with a little help from Google Apps Script.

Google Sheets Custom Function 

The Google Sheets custom function is pretty simple to use. It just takes a Reddit post URL as the first parameter and a set of optional arguments that correspond to that post’s properties. 

=getRedditData("https://www.reddit.com/r/GoogleAppsScript/comments/godc9m/can_google_apps_script_be_used_to_create_an/","title","ups","downs")

Since Reddit does not require any special authentication or custom parameters to get the JSON data, this whole process is quite simple. We need a function that does three things:

  1. Make a GET request to Reddit to retrieve the post’s JSON data
  2. Parse the JSON data and extract the interesting information
  3. Write the interesting information to Google Sheets

To make this function a bit more useful, the function will take a few parameters: the URL of the page we want to get the JSON from, and a list of keys from the JSON object that should be written to Google Sheets.

Google Apps Script Code Example

Since it’s not too complex, let’s get right to the code!

function getRedditData(redditPostURL) {

// Append “.json” to the end of the post URL and makes a GET request to the JSON-ified endpoint
 var post = UrlFetchApp.fetch(redditPostURL + '.json');

// Parse the response
 var content = JSON.parse(post.getContentText());
 var op = content[0].data.children[0].data;

// Get the values associated with the function arguments
 var values = [];
 for (var i = 1; i < arguments.length; i++){
 values.push(op[arguments[i]])
 }

// Write a single row
 return [values]
 };

With this function, It’s helpful to start from the end. With Google Sheets functions, results have to be returned as a single value or as a range. This function returns a range, which in Javascript is represented as an array of arrays. The outer array represents the rows of the range and the inner array represents the cell values of each row. 

In order for this function to return a single row of values, the function has to return an array with a single element which is an array. In other words, that’s why the return value is, [values] where values is an array.

Reddit Post JSON Representation

A big part of the function’s logic is concerned with parsing the post’s JSON into a Javascript object named content. It’s important to notice that the JSON that represents the Reddit post is an array of objects and the first object represents the original post. To get interesting data about the post, you need to look at the data object… in the first element in the children array… inside the data object.

That first child’s data object has the following keys:

approved_at_utc, subreddit, selftext, user_reports, saved, mod_reason_title, gilded, clicked, title, link_flair_richtext, subreddit_name_prefixed, hidden, pwls, link_flair_css_class, downs, parent_whitelist_status, hide_score, name, quarantine, link_flair_text_color, upvote_ratio, author_flair_background_color, subreddit_type, ups, total_awards_received, media_embed, author_flair_template_id, is_original_content, author_fullname, secure_media, is_reddit_media_domain, is_meta, category, secure_media_embed, link_flair_text, can_mod_post, num_duplicates, approved_by, author_premium, thumbnail, edited, author_flair_css_class, author_flair_richtext, gildings, content_categories, is_self, mod_note, created, link_flair_type, wls, removed_by_category, banned_by, author_flair_type, domain, allow_live_comments, selftext_html, likes, suggested_sort, banned_at_utc, view_count, archived, score, no_follow, is_crosspostable, pinned, over_18, all_awardings, awarders, media_only, link_flair_template_id, can_gild, spoiler, locked, author_flair_text, treatment_tags, visited, removed_by, num_reports, distinguished, subreddit_id, mod_reason_by, removal_reason, link_flair_background_color, id, is_robot_indexable, report_reasons, author, discussion_type, num_comments, send_replies, media, contest_mode, author_patreon_flair, author_flair_text_color, permalink, whitelist_status, stickied, url, subreddit_subscribers, created_utc, num_crossposts, mod_reports, is_video.

I’ll let you determine which pieces of data you find most interesting… https://jsonformatter.org/ is a useful tool to explore the data inside the object and determine if it’s interesting.

Writing data to Google Sheets

Once the JSON is parsed and stored in the “op” (original post)  object, then the function user can decide what data to write to the row by adding the key names as arguments to the function. 

This works because of a cool Javascript feature called the arguments pseudo-array. It’s a “hidden” variable inside every function. It’s like an array that contains a list of elements (one for each of the function’s arguments) but it’s often called a  “pseudo-array” because it does not have all the normal array methods like .map().

Google Apps Script Custom Function

The function returns an array of values—one for each of the key names listed above—wrapped in an array to produce a result like this:

Metric Accuracy and a note about “Fuzzing”

The reason I made this function was to record the number of upvotes and comments that different Reddit posts receive. This is useful for research on topics and audiences for digital marketing. As you can imagine, I wasn’t the first person to attempt this. At one point, hackers used these metrics to game Reddit’s systems and spam its users. As a result, Reddit made it impossible to get exact counts of upvotes and downvotes. This is a process called “fuzzing”

Fuzzing is the process of adding in a random margin of error to the metrics that you see on Reddit posts. So while you might see that a post has 10 upvotes, I might see the same post has 12, or 9. 

Fuzzing is interesting because you might find that you run the same function with the same arguments and end up with different results. That might seem unusual but at least you know that you can expect it. The JSON from the post my return different metrics at different times.

Go get your Reddit Metrics

Now that you can go wild with all this Reddit data! As long as you don’t violate their User Agreement). You can also adapt this function as much as you need to capture the data in just the way that you want it. Have fun!

Leave a Comment

Your email address will not be published. Required fields are marked *