Getting Started with SQL for Marketing (with Facebook Ads Example)

As a digital marketer, I use SQL every single day. And looking back on my career so far, it would be fair (though a bit reductive) to say that I could define my career by two distinct periods: before I learned SQL and after I learned SQL. The two periods are distinct for three main reasons: 

  1. After learnings SQL I am faster at gaining insight from data 
  2. After learnings SQL I am able to make decisions based on more data
  3. As a result, I’ve been making better marketing decisions—and I have seen the traffic, conversion rates, and ROI to prove it. (Thanks to SQL)

If you’re at a crossroads in your career and you find yourself asking, “what coding language should I learn,” here is my case for SQL.

What is SQL (for Digital Marketing)

When you see SQL you might think it means “Sales Qualified Lead” but more commonly, SQL stands for “Structured Query Language.” It is a programming language that allows you to retrieve (or update, alter or delete) data from relational databases. (Relational is just a fancy word for a database that stores data in tables.) 

It’s kind of like ordering from McDonald’s. SQL is a language – a specific set of instructions – that you use to specify the results you want, the way you want them, in the quantity you want. Basically, SQL allows you to have your data your way.

How is SQL Used in Business

SQL has two main uses: applications and analysis. Applications (apps) from CandyCrush to Instagram store content and data about users in databases and then use it to create an experience (like keep track of how many comments you have on an Instagram post). On the other hand, you can use SQL for analysis in the same way you can sort, filter, and pivot data in Excel. (except with a lot more data)

SQL is different from most programming languages like Javascript, Python, and PHP because it only has one use: retrieving data from relational databases. So you can’t use SQL to build a website or a chatbot but you can use programming languages like Javascript, Python, and PHP to send SQL commands to databases and do something interesting with the results. WordPress is a good example of this. WordPress is written in PHP and the PHP code sends the SQL commands to a MySQL database and formats the data into blog articles and article lists.

What’s the difference between SQL and Excel?

Remember when you learned your first Excel formula? Pivot tables? VLOOKUP? You probably through you could take on the world! SQL is like that times 100. SQL and Excel are similar because they both allow you to analyze, manipulate, and make calculations, and join data in tables. 

The biggest difference between Excel and SQL is that you can analyze exponentially more data exponentially faster with SQL but you can’t update the data in SQL quite as easily. Also, SQL commands define how you want your data table to look when the data is retrieved so you are working with entire tables rather than individual cells. The benefit of this is that you don’t have to worry about making mistakes when copying formulas (and the analysis errors that come with that.) On the whole, I’d say SQL is much better than Excel, most of the time.

SQL Example in Marketing

This example shows an ROI analysis using SQL code that you would use to calculate how many customers you’ve acquired per country since the beginning of 2020, and the Facebook Ads spend that was spent in that country. 

SELECT country, sum(customer_count) total_customers, sum(spend) ad_spend
	SELECT customers.ip_country, count(email) customer_count
	FROM customers 
	WHERE customers.createdate > '2020-01-01'
	GROUP BY country) new_customers
JOIN facebook_ads ON = new_customers.ip_country
WHERE > '2020-01-01'
GROUP BY country
ORDER BY ad_spend desc;

The example does the following:

  1. Aggregate a table of customers into a table of countries and customer counts who have become customers since January 1st, 2020.
  2. Joins that table with another table that contains Facebook Ads data by day
  3. Filters in only Facebook Ad spend data since January 1st, 2020
  4. Aggregates this all into a single table that has three columns: country, count of new customers from that country, and the ad spend for that country.

The good news is, this is about as complex as SQL gets. Pretty much everything else in SQL is just a variation of this.

Is SQL worth Learning?

In a word, yes. There is a practical reason and a conceptual one. The conceptual one is that learning SQL, like learning data structures or other programming languages, will expand how you think about data. It will help you organize data for analysis more efficiently and help you structure your thinking about how to answer questions with data. So even without a database, SQL can help you work with data.

The practical reason for learning SQL is that it allows you to gain insight faster, from more data, and come to better conclusions. That is true if you are analyzing keywords for PPC or SEO, analyzing how leads flow through your sales funnel, analyzing how to improve your email open rates, or analyzing traffic ROI.

 Here are just a few good reasons.

  1. You’ll spend less time trying to export and import data into spreadsheets
  2. You’ll be able to replicate your analysis easily from week to week or month to month
  3. You’ll be able to analyze more than 10k rows of data at once
  4. You can use BI tools to build dashboards with your data (and always keep them fresh)
  5. You’ll be able to merge bigger datasets together faster than VLOOKUPs
  6. You won’t have to ask for help from IT people, DBAs or engineers to get data out of a database or data warehouse for your analysis

How long does it take to learn SQL?

With dedication, you can develop a strong foundation in SQL in five weeks. I recommend Duke’s SQL class on Coursera to go from zero to usable SQL skills in less than two months. With that class and a couple of books about PostgreSQL, I was on par with most analysts at Postmates (except I had the context about the data!). A few months later  I learned enough to record this SQL demo with SEM data.

There are even good Android/iPhone apps that will help you learn the syntax through repetition. The class I recommend below (Data Manipulation at Scale: Systems and Algorithms) for Python also touches on SQL so it’s a double whammy and Python Anywhere also features hosted MySQL, so that’s a double-double whammy!

If you are looking for a short but substantive overview of SQL, this video from Free Code Camp is pretty good. I’m not suggesting you’re going to know how to write SQL in four hours, but at least you will get the gist.

All that being said, like many programming languages, learning SQL is a continual practice because, after learning the language, you can expand into managing a database rather than just analyzing data in a database. You can also pair your SQL skill with other programming skills to make all sorts of interesting applications! The good news, for the most part, it’s like riding a bike, once you learn it, you don’t really forget it—but it will take you a bit of time to re-learn a wheelie.

How to Scale Yourself in a Growth Role, Part 1: Your Time

A personal note: These are my learnings from consulting, being part of a growth team at Postmates, then leading growth at Panoply, so hopefully, it’s insightful. This post has been idling in a Google Doc for weeks because I couldn’t dedicate the time to complete it. So instead of never completing a single long post, I decided to break it into posts. This way its easier to publish and hopefully easier to read. If you want to catch the next two posts on “Your Work” and “Your Team” sign up here.

Growth is simple. It is just a measure size over time.

You can grow anything— visitors, users, subscribers, customers, revenue, profit—it doesn’t matter. If you can measure it, you can put it on a timeline, and you can grow it.

Accelerate Growth

I’m clearly not a gif expert.


The fun starts when you start trying to accelerate the rate of growth. Luckily, since growth is just the increase in size over time, you can simplify this problem down to reducing the amount of time it takes to get to a given size. And because the growth role in a startup is all about… growth, you can never move too fast. The question is, how?

This is where scaling your self comes in. By “scale,” I mean specifically increasing capacity (the ability to do more in the time you have) and competence (the ability to just do more).  This is how I would break down the ways you can scale yourself to accelerate growth. (part 1 of 3) 

Some of the later recommendations assume you have cash to spend or personal time that you are willing to devote to personal development. But to begin with, the list is prioritized in a way that starts with things that everyone can do and continues to things that are resource-dependent.

Do less.

If you’ve ever found yourself at 3 pm with 40 browser tabs open and not a single thing crossed off your to-do list, this one is for you. You know its much easier to join a watercooler brainstorming session or get lost in a Google search rabbit hole than to align stakeholders, reply to the mile-long partner email threads, and dig into the tedious details of a campaign launch. This one is all about self-discipline.

By “do less”, I mean “do less of the things that take time but don’t drive growth.” Since growth is a factor of time, it is your most precious commodity—especially if you’re running on a lean budget. Wasted time stunts growth.

This may not seem like a revelation but when it comes to self-discipline, it comes down to honesty. You have to be honest with yourself when you ask yourself, “Is this activity driving growth faster than all the other activities I could be doing?”

If you struggle with this type of introspection, ask yourself, what would the “Jerry Rice / Greta Thunberg / Steve Jobs of growth” do? The answer is probably something way more badass than, “continue scrolling through Instagram.”

But while self-discipline is great to keep you driving fast, prioritization is what keeps you driving toward success.

Prioritize. Don’t confuse activity for productivity.

In the previous section, I eluded to what I’ll refer to as the opportunity cost of time. The question, “Is this activity driving growth faster than all the other activities I could be doing?” considers that every activity takes some time but each activity accelerates growth at a different rate. 

For example, if you are scrolling through Instagram, you are increasing your rate of growth by roughly 0. But if you spend the same time launching a landing A/B test you might see a measurable increase in your rate of growth. If you spend your time onboarding an expert A/B testing freelancer, you’re likely to have a compounding effect!

How do you know what to prioritize?

First, you will need to be very familiar with you “one metric to rule them all.” As much as I’d like to dive into the discussion about metric selection and validation, instead I’m going to recommend reading “Lean Analytics.”

There is always “the one metric” at the bottom of the funnel and most often it relates to revenue. Mature growth programs will have a good sense of what the funnel looks like above this final metric. And growth programs that are moving at full speed are able to evaluate levers in terms of how they impact growth. 

If both of these ideas seem foreign to you, you better start building a funnel and quantifying the effect of various levers against your funnel. In other words, consider how things like keyword expansion, link building, content creation, landing page testing, etc. affect conversion rate through your funnel.

Once you know how powerful the different levers in your growth program are, you can start to determine how much time it will take to move a lever (and corresponding metric) a given amount. Then you can ask, “Is this activity driving growth faster than all the other activities I could be doing?”

The opportunity cost of time

I love dealing with the metaphysics of this discussion. As I laid out above, every period of time can be evaluated by its potential to accelerate growth. In addition, every activity requires an amount of time to accomplish. And finally, every activity effect can vary in the eventuality and duration of its effect. Time, this single dimension, has a three-dimensional impact on growth!

While it can be valuable to consider the dimensions of time, the last thing you want is analysis paralysis. That is why I recommend a universally helpful and satisfyingly simple tool: Dwight D. Eisenhower’s prioritization framework.

Dwight D. Eisenhower’s prioritization framework measures everything that could be done on two axes: urgency and importance. Let’s break it down in the context of growth.

Urgent unfortunately often means broken. This could be things like robots.txt files, shopping carts, links in email templates, or sign up forms. Every minute that these things stay broken, growth is probably decelerating. Urgency can also relate to time-sensitive campaigns that relate to external events or product/feature releases that must be prioritized.

Important things are generally measured against their impact on the “metric to rule them all.” You can generally gauge importance by asking “if I build/launch/do this thing, how will it affect growth in the next week/month/year?” (Consider the tradeoff in the week/month/year timeframes. Don’t always skip SEO in favor of quick response email or push campaigns.)

Next time you have to decide between updating a tired email template and writing a spec for a high-potential SEO feature, ask yourself, which will drive the most growth today and into the future. If your dev team has time to fill, perhaps pause the emails and give them a spec. If your “Welcome“ email is broken, fix it quickly and then deliver the spec.

A word of caution

As someone who clearly has a tendency to overanalyze things, I have one small word of caution: don’t let yourself fall into the trap of analysis paralysis. The 80% rule often applies to prioritization too. If you spend half your day trying to figure out what to do first, just call it a tie and get both things done.

It’s your time

I hope this discussion brought up some new ideas for you. If you disagree or notices something that is not included, please leave a comment and build on the discussion. 

If your company is in the early stages of growth and want to discuss how to sharpen your focus and accelerate growth, contact me on GrowthMentor. I love sharing what I’ve learned from my experience. As you’ll find in part 3, often mentorship is the biggest growth accelerant.

If you want to stay tuned for parts 2 and 3 sign up to receive the posts in your inbox. (no spam, just posts)

Drift Chat Event Tracking with Google Tag Manager – Less Code, More Analytics

The market for website chatbots is growing steadily and with it, there have been a lot of entrants since Intercom lead the way. From Hubspot to Whisbi, each has their own take on it. What I found especially impressive is how Drift has opened up the tech side of their platform. This makes it a really fun marketing technology. Building chatbots is a mix of analytical thinking, user experience design, and creativity. On top of that, it is easy to iterate and optimize quickly thanks to the amount of qualitative and quantitative feedback that the chats offer. While I’ve seen the Drift team steadily working on bot flow analytics, I’ll be honest I wouldn’t mind a lot more access to the data that is collected through the bot interactions.

Drift’s Javascript API provides a pretty comprehensive list of chat events to “listen” for (meaning you can trigger Javascript code when a chat event, like a “chat started” happens). With Google Tag Manager, you can send these events to an analytics tool like Google Analytics. You can even send along a bit of event metadata like the Drift inbox ID and conversation ID, so it’s easy to find the Drift conversation at the URL:{{inbox ID}}/conversations/{{conversation ID}}

Unfortunately, the API does not provide that actual text of the conversation (for a good reason) but using Google Analytics, you can still collect sidebar and message events and analyze them in aggregate. This can be really helpful in understanding on which URLs users are interacting with bots. From there you can spend your time building an optimizing bots with maximum effect.

Google Tag Manager Data Layer Events

The code at the end of the post registers a callback function for all the following events:

  • drift loaded
  • sidebar opened
  • sidebar closed
  • welcome message opened
  • welcome message closed
  • away message opened
  • away message closed
  • campaign message opened
  • campaign message closed
  • CTA clicked
  • campaign user started a chat or submitted an email
  • slider message closed
  • user started a new chat
  • user replied to a conversation
  • user submitted email address
  • schedule meeting card pushed to a conversation
  • user booked a meeting

Each event will push an object to the dataLayer that looks something like this:

  "driftEventType": "conversation",
  "driftEventName": "message:sent",
  "driftEventDescription": "user replied to a conversation",
  "event": "drift",
  "driftEventData": {
    "data": {
      "sidebarOpen": true,
      "widgetVisible": true,
      "isOnline": true
    "conversationId": XXXXXXXXX,
    "inboxId": XXXXXX
  "gtm.uniqueEventId": XX

That makes it easy to set up GTM Tags to fire on all Drift events based on a trigger with the firing rule: data layer event EQUALS drift You can be more specific about your firing rules by using the codedriftEventTypedriftEventName, or driftEventDescription.

The data layer event values can map directly to Google Analytics event values (driftEventName is good for Event Action and driftEventDescription is good for Event Description) Or, instead of using the driftEventDescription as the Google Analytics event description, you can use the inbox ID or conversation ID as well. You could even set event scoped custom-variable and capture all of it!

Enough talk! Get to the code.

To set up the Drift event listener, you will need to place the following Javascript code in GTM HTML tag between <script> tags. Make sure that the HTML tag is fired after the Drift is initialized so if you are using Google Tag Manager to instantiate Drift, setup your tags sequentially.

var driftEvents = [
  {driftEventType: 'sidebar', driftEventName: 'sidebarOpen', driftEventDescription: 'sidebar opened'},
  {driftEventType: 'sidebar', driftEventName: 'sidebarClose', driftEventDescription: 'sidebar closed'},
  {driftEventType: 'welcome message', driftEventName: 'welcomeMessage:open', driftEventDescription: 'welcome message opened'},
  {driftEventType: 'welcome message', driftEventName: 'welcomeMessage:close', driftEventDescription: 'welcome message closed'},
  {driftEventType: 'away message', driftEventName: 'awayMessage:open', driftEventDescription: 'away message opened'},
  {driftEventType: 'away message', driftEventName: 'awayMessage:close', driftEventDescription: 'away message closed'},
  {driftEventType: 'campaign', driftEventName: 'campaign:open', driftEventDescription: 'campaign message opened'},
  {driftEventType: 'campaign', driftEventName: 'campaign:dismiss', driftEventDescription: 'campaign message closed'},
  {driftEventType: 'campaign', driftEventName: 'campaign:click', driftEventDescription: 'CTA clicked'},
  {driftEventType: 'campaign', driftEventName: 'campaign:submit', driftEventDescription: 'campaign user started a chat or submitted an email'},
  {driftEventType: 'slider message', driftEventName: 'sliderMessage:close', driftEventDescription: 'slider message closed'},
  {driftEventType: 'conversation', driftEventName: 'startConversation', driftEventDescription: 'user started a new chat'},
  {driftEventType: 'conversation', driftEventName: 'message:sent', driftEventDescription: 'user replied to a conversation'},
  {driftEventType: 'conversation', driftEventName: 'message', description : 'user received a message from a team member'},
  {driftEventType: 'conversation', driftEventName: 'emailCapture', driftEventDescription: 'user submitted email address'},
  {driftEventType: 'conversation', driftEventName: 'scheduling:requestMeeting', driftEventDescription: 'schedule meeting card pushed to a conversation'},
  {driftEventType: 'conversation', driftEventName: 'scheduling:meetingBooked', driftEventDescription: 'user booked a meeting'}

drift.on('ready', function (api, payload) {
  dataLayer.push({event: 'drift', driftEventName: 'driftReady', driftEventDescription: 'drift loaded', driftEventData: payload});
  driftEvents.forEach(function (driftEvent) {
    drift.on(driftEvent.driftEventName, function (data) {
      driftEvent.event = 'drift';
      driftEvent.driftEventData = data;

This code might look weird at first glance but notice that instead of writing a code block to tell Drift to register a callback for each event, the code loops through all the events to register the callbacks. Paste it in your browser’s Javascript console (on a page that has Drift loaded) to see it work. When you interact with your Drift bot, you’ll see that events being sent to the data layer. You can also view the code on this Github Gist.

If this seems a bit intimidating today, I’d recommend checking out my open source guide: Digital Marketing Technical Fundamentals on GitHub for guidance on learning the tech side of digital marketing. Good luck!

Metrics and Levers of an SEO Measurement Framework

You are probably here because you have more SEO data than you know what to do with and precious little time to make sense of it. A framework is exactly what you need.

An SEO measurement framework draws the relationships between all the metrics that you collect and the levers that you can pull to improve them. Frameworks reduce complexity, provide insight, and best of all, frameworks enable focus.

SEO happens in a dynamic system that spans from technical infrastructure to human relationships, and user experience. But SEO outcomes are measured in acquisition, activation, and revenue. I believe (and you should too) in measuring SEO impact against outcomes in an effort to evaluate CAC.  We should also trust a system of metrics to avoid risks, surface optimizations, and identify opportunities. Let’s see what this system looks like.

SEO metrics and levers

Each SEO metric (on the left) can be influenced by one of several levers (across the top). Metrics can be viewed from the top down as a diagnostic tool or bottom-up as a progressive checklist – like a “hierarchy of needs” for SEO.

The outcome metrics at the top will matter to every business site and the lower, more technical metrics, will matter more to larger sites and more sophisticated SEO programs.

The purpose of this SEO framework is three-fold:

  • Tie together metrics and levers to measure the impact of each SEO lever
  • Provide a diagnostic for issues or optimization—each metrics is dependant upon the levers that influence that metric and the metrics below it
  • Identify the depth of metrics that is appropriate for tracking depending on the scope of an SEO program

Before we get too deep into detail about the usage of the framework let’s first take into consideration the ingredients for this framework, the data itself.

A Story: The Lifecycle of SEO Data

The intersection between man and machine; search creates a lot of data. To put it all in context, let’s think about it as a story of a lowly web page.

Creating a webpage generates data (it is data!). Publishing the page generates metadata (publish time, author, language, file size, and link relationships). And when you ask Google to fetch the page and submit it to the index, Google collects all this data and generates even more (a crawl date, response code, response latency, load time, and render time). If it decides to index the page… yep, more data (time of indexing).

Now that the page is in the index, it has the possibility to be returned in search results.  Wow does that generate a lot of data which Google, of course, collects. But for the intent of SEO, we care about the data we get back from Google Search Console (they keyword that retrieved the page, the device, geography, search type, the time of impression, where it ranked and if it got clicked). Over time, this data really adds up!

Humans are the last part of the story. Search results like lunch menus at a diner, give humans an opportunity to browse, select, and consume different options and transact with their owners. More data. (technical metrics like page speed, behavioral metrics like time on site, interactions, and transactions to name a few)

If we step back one step further, we see that these humans are referencing the page from other pages and social media posts. All those references create more traffic, not to mention more data for Google to factor into its algorithms. On and on this goes as Googlebot is hard at work traversing the internet to index a picture of this web so that it can deliver all this data to a ranking algorithm, that combined with humans’ personal, demographic and interaction data creates the phenomenon that is search.

Looking at one page, this all kind of makes sense. But when you multiply that by a few thousand, or a few million, things get complicated. Additionally, the data is collected in several different ways and stored in several different places. Let’s use this story from creation to consumption to transaction as the basis of an SEO measurement framework.

Story time is over so let’s get on to the metrics!

SEO metrics and levers

SEO Metrics

Each metric, from Crawl to Revenue, is dependant upon the metrics below it. As the story of the lonely webpage goes, a page must be created, crawled, indexed, returned in search results, and clicked for it to have any business impact.

SEO Outcomes

Unlike the universality of SEO performance metrics, outcomes are business-specific. At the outset of any strategic planning for SEO, you must define what success looks like. As soon as it is defined, track that conversion in your analytics tools so that you can measure that conversion completion against each performance metric. This will help you determine where to invest time and resources. Performance metrics will tell you how to invest time and resources.

Performance Metrics

Let’s start from the bottom and work our way up. In this way, each metric impacts the one that follows it.

Crawl: the taken for granted pre-requisite

The most taken-for-granted metric in search engine optimization is crawling. That is partly because the vast majority of the internet is either built on a CMS that optimize for crawling by default or the site is not big enough to have crawl issues. However, there is a class of home-baked websites that, due to developer oversight (aka “job security”), or otherwise, are not crawl-friendly. The bottom line is: if Google can’t find your pages, because you didn’t use React Router correctly or you forgot to put up a sitemap, SEO is not happening.

Index: this is an actual thing

The internet is really big- if Google wanted to index the whole thing it would probably double its electricity bill.

Search engines have to be picky about what they index. If a page is low quality or is likely to have a low search demand, Google can choose not to index the content of the page. Unless your site is millions upon millions of pages, isn’t internationalized correctly,  returns lot’s of low quality or duplicate pages, or violates Google’s TOS, it’s likely that all your pages if crawled, will be indexed.

It is important to consider the flip side of this problem though: indexing too much. Indexing a bunch of garbage pages can waste Google crawl budgets and make a site appear to be low quality. This lever can be pulled in both ways and depending on the site there is a right and wrong way to pull it.

Impressions: search impressions and searchers’ impressions

Yay, the site is indexed! Ok, now what? Impressions and the keywords that trigger them are a measure of your site’s effectiveness in SEO targeting. Not all keyword impressions are created equal.

Searcher context, like the device or geography, can have a huge impact on outcomes. Likewise, less traffic from better-targeted keywords can have a stronger effect on outcomes than lots of poor-quality traffic. Keyword impressions are a fundamental part of your customer understanding and a strategic north star.

Rankings: Oh sweet competition

Search engine result pages, with stacked ranked lists of URLs, are what created and sustain what we call SEO. Rankings are a moving target but we have reason to trust our theories about how they work. At the core, there is content and links but there are hundreds, if not thousands of other factors. Measuring keyword-URL rankings are important but seeking to understand why pages rank is almost as important. Measuring how process metrics, like the quantity and quality of inbound links, the quantity, and quality of content, and the components of speed are what differentiate a deliberate SEO program from a spray-and-pray approach.

Clicks: The first “win” (but not the last word)

Traffic is the easiest metric to report and the most misleading. To be useful, traffic metrics should be segmented by keyword groupings or page grouping to understand searcher intent and compared to outcome metrics to uncover if that intent was met appropriately. Traffic is a good indicator of performance but they are nothing without measures of quality like click-through rate and conversion rate.

SEO metrics and levers

SEO Levers

SEO levers are at the top of the matrix. These five buckets describe the five areas of optimization and growth at an SEO program’s disposal. The dots in the grid represent the relationship between a lever and the metric that it could directly influence.

SEO levers are ordered from left to right, from foundational technical factors to business-wise tactical factors. With few exceptions, you must get the foundation right before thinking about building upon it. Let’s take a look at each of these levers.

Links: The bowl and the cherry

Links are both basic and critical- like the bowl and the cherry on top of the ice cream sundae. Sites need sitemaps and internal links so that pages can be crawled and indexed and at the same time, external links are a major factor in Google’s ranking algorithm.

When to use this lever: Day one of publishing a site so that the pages have a chance to be crawled, indexed, and returned in search results. Or later on, when you have critical or high-conversion pages that have high impressions but don’t rank very well—ranking improvement will directly impact SEO outcomes.

Response/Render/Speed: AKA “technical SEO”

Search engine crawlers are not going to waste their time on pages that return an error or that take an infinity to load. Pages that hide a lot of their content behind javascript rendering or user interaction events rank poorly in search compared to their server-side rendered counterparts.

When to use this lever: Always monitor page responses for bots. 4XX’s and 5XX’s response codes will cause pages to fall out of the index. If you find that Google’s reported “Time spent downloading a page (in milliseconds)” is in the seconds or that pages feel slow when they are not cached, use this lever.

Meta/Schema/Title: Basic.

The page title, meta description, and schema markup can have the biggest impact on SEO with the least amount of work. Optimizing a page title for a target term can significantly influence ranking and Click Through Rate. From quick one-off metadata optimizations, site section-wide A/B tests, to sitewide implementation of markup, these optimizations can always yield benefits.

When to use this lever: Always unless you’re certain you have something better to do.

Content: Keep humans happy. Keep bots happy.

Page content is the only lever that affects literally every SEO metric. Page content determines if a page is worthy of indexing and of repeated bot crawls to check for fresh updates. Along with page title and schema, content determines what keywords will surface a page for in search. The content will also affect SEO outcomes as the topic of the page could target more or less search volume or conversion-oriented keywords. And of course, the content is ultimately there to influence a user.

This is the shallow end of SEO in some respects but when things get competitive, this is a critical lever. More content usually means more traffic. Better and fresher content usually mean more on top of that.

When to use this lever: Need to start getting traffic? Create content. Need better traffic? Create content that target’s the right keywords. Need to maximize conversions and capture every possible edge to compete with tough competitors? Test the crap out of your content.

Experience:  You grew the fruit, now harvest the juice

Experience, like crawl optimization, is often dismissed, taken for granted, or believed to be outside of the purvue of SEO programs. Those are dangerous opinions. As Google considers more inputs into its ranking algorithms that proxy user experience (for example, how frequently a searcher bounces back to search results), the more important user experience becomes.

Let’s not forget, you’ve done a lot of work to get searchers to your site! It is now your duty, not to mention your goal, to optimize the experience to the point that users are happy, ready, and able to convert. No part of SEO happens in a vacuum and just as content impacts experience, experience could impact ranking. Recognize this as part of the system.

When to use this lever: Always, but especially if organic traffic is hard to come by or you are heavy on traffic and light on conversions. And never let poor experience hurt your rankings.

SEO Reporting

A holistic picture of the SEO is great but it doesn’t do anything unless you do something with it. As the saying goes, “what get’s measured gets managed.”  Reporting on the right metrics sets the focus of an SEO program.

First, two underlying truth about reporting and one smart suggestion:

  • Metrics are only valuable to the degree that they can be impacted. Just because you collected a metric does not mean you need to report it. In many cases, alerts are just fine.
  • Reporting cadence should match the pace at which a metric can change—too fast is noise and too slow can be too late.
  • Investment in data collection should be proportionate to investment in SEO efforts

Reporting needs and focus will differ from business to business. The important thing is that they capture the goals and the actionable metrics and do not create noise. Collecting a lot of data is great, but most of it is only good when you need it to understand a cause or effect.

I think of reporting in three levels. Each level increases in granularity and narrows the scope of information. At the forefront are the business outcomes that SEO has yielded. Each level after is more for the operations of an SEO program rather and less for external stakeholders.

  • Outcomes“How is SEO doing?”

      • Cadence: Match Weekly/Monthly/Quarterly business reporting
      • Metrics: Organic Traffic, Signups, First Orders, App Downloads
      • Dimensions: Period over Period or Over Time
  • Performance / Outcomes in Context – “Why is SEO working?”

      • Cadence: Real-time / Daily Update Dashboards
      • Metrics: Impressions,  Rank, Clicks, Signups, First Orders, App Downloads
      • Dimensions: Period over Period or Over Time, Geography, Page Template or Keyword Grouping
  • Process /Monitoring – “What is the status or program X?”

    • Cadence: Dashboards / Ongoing monitoring and analysis of current programs or experiments
    • Metrics: Crawl rate, Speed, Impressions, Rank, Clicks, Page Interactions, Inbound Links
    • Dimensions: Response Codes (crawlers), Page Template

Measure the Metrics, Move the Levers

If there is one message to take from all this, it would be to understand that you may not need to track and manage every metric but you must know what metrics are important to achieve the intended outcomes of your SEO program. If you understand what metrics matter, it becomes easy to create focused strategies to achieve the right outcomes.

Pinging Search Engines with Your Sitemap Works – Here’s How

Freshness matters. It matters to searchers, therefore it matters to search engines, therefore it should matter to you. The best tool to ensure that search engines have your site’s freshest content is your sitemap. This post explains why pinging search engines with your sitemaps is important for SEO and how to optimize the process and stay fresh.

Why Sitemaps? Why Ping them?

You have probably already submitted your sitemap to Google, Bing and other search engines using their webmaster consoles, if so, you’re in a good spot— rest assured that search engines know that you have a sitemap. And if you have a site that is relatively static, meaning that pages are not being updated or created often, this is probably good enough. Depending on the perceived value of your site from a search engine’s perspective, search engines are probably requesting your sitemap(s) at a minimum, close to once a day, or potentially several times a day.

If you have a site that changes frequently then pinging search engines with your sitemaps is important. It is in your, and Google’s, best interest that they have the newest and most up-to-date content in their index, available to serve in search results. Examples of this would be marketplace sites where new products or services are being listed all the time, currency exchange listing sites where pages are continuously updating, or news publishers where freshness is critical.

To ensure that Google and other search engines know when your site has new content pinging them with your recently-updated sitemap is a must.

How pinging works

It is actually very simple. If you have ever used a web browser you will probably understand.

  1. You send an HTTP GET request to a specified “ping” URL on a search engine’s server with your sitemap’s URL appended to it.
  2. When the search engine receives the request, they, in turn, send an HTTP GET request to the sitemap URL that you submitted.
  3. If your sitemap returns a 200 “OK” response, with a valid sitemap, the search engine will scan the URL’s on the sitemap and reconcile them against the URLs that they have previously discovered.
  4. The search engine may decide to crawl some, all, or none of the of URL’s listed in the sitemap.

Sitemap Notification Received

It is important to note that pinging a sitemap does not guarantee that the sitemap URL’s will be crawled. Google says, “Please note that we do not add all submitted URLs to our index, and we cannot make any predictions or guarantees about when or if they will appear.” It is pretty safe to say that all the URLs will be recognized though. That being said, crawl rate has a tendency to increase sharply after you notify search engines that they are fresh.

Crawl Logs from Googlebot

How often to ping

Google used to recommend pinging no more than once per hour in their sitemaps documentation. As of 2015, they do not suggest any rate limit. Things have changed…

At Postmates, while testing Airflow data pipelines, we pinged our sitemaps 174 times in one day. Looking at our server logs, we recognized that, every single attempt to ping the sitemaps lead to a subsequent GET request from Googlebot. This was true for each of our sitemaps— every time.

The subtle nuance though, is that, since our sitemaps did not actually change during that period – and had been cached – the server returned a 304 (Not Modified) response and did not actually send a sitemap. This happens because Googlebot sends a request with a Last-Modified header as a comparison for the freshness of the sitemap file. The value of the Last-Modified header is the same as the last time that you pinged Google. This is Google saying essentially, “as long as the sitemap file has changed since I looked last time, I’m not interested.”

Stay Fresh

The takeaway is that if you really want to keep Google’s index of your site fresh, ping Google with your sitemaps. Do it as often as you have fresh content and make sure that the content listed in the sitemap file is discernibly different than last time you pinged Google. In addition, make sure that you set the appropriate headers with the updated sitemap to ensure that Googlebot will actually receive the new sitemap.

If you are ready to start pinging search engines, here are the URLs to do it:<sitemap_url><sitemap_url>

If you want to tell Google about URLs manually, you can click these links and then append the full URL of your sitemap or sitemap index files.

Adding a Hubspot Data Layer for Personalization and Profit

As we all know, the data layer (dataLayer if you’re a nerd) is a great way to provide data about an application, it’s users, and, it’s content to analytics and marketing technologies. But one thing that is underappreciated about the data layer is that it provides context for personalization.

There are a number of ways to employ the dataLayer for personalization. There are tools like  Optmizely and Google Optimize, and an open source tool that I think is especially interesting called Groucho. (If you are interested in trying it for WordPress you can try my plugin 🙂 )  

Another way to personalize that is a lot more accessible to many websites is chat. I have been working on a project that uses Drift as the chat/chatbots and Hubspot as a CMS. This has made it possible to explore the options for personalizing chat dialogs based on the characteristics of a given user, like funnel stage and lead score.

This Hubspot + Drift project is still early days, but one of the early problems what I needed to solve was getting the contextual data from Hubspot to the dataLayer so I could start to segment out content, and personify users. There was a surprising lack of information on this topic (perhaps because Hubspot would rather have you use their lackluster web analytics solutions so I decided I would open up the discussion and show how to initialize a data layer in Hubspot. 

Hubspot data layer

This is what we are going to achieve, except with real values.

Where to put the Data Layer code

Let’s dive right in because it is not even that complicated. * Note, this assumes you have the required permissions to modify page templates in Hubspot.

If the data from the data layer is to be sent with a tag as soon as Javascript runs (“gtm.js”) then you need to write your dataLayer initialization above the Google Tag Manager snippet. The best way to do this is to write the data layer right into the page template. That way you can be sure GTM has access to the dataLayer variables when GTM is initialized.

The other way that is a bit lighter on the code modifications is to write the code into the “Additional <head> markup” section if you are using a drag and drop template. The downside here is that you will have to push the data to the data layer with an arbitrary event, (eg {event: 'hsInitialization'}) if you want to fire a tag before gtm.dom because gtm.js will fire before the dataLayer initialization. You should be thinking, “ok, I’ll modify the template directly,” or create and embed a Hubspot module to generalize the dataLayer initialization.

Ok! I will show some code before navigate “back.”

Writing the Data Layer to the Page

Your Hubspot dataLayer initialization (above the GTM container) will look something like this:

  var dataLayer = [{
    email: "{{ }}",
    hubspotscore: "{{ contact.hubspotscore }}",
    hs_analytics_num_page_views: "{{ contact.hs_analytics_num_page_views }}",           
    hs_analytics_num_visits: "{{ contact.hs_analytics_num_visits }}", 
    hs_predictivescoringtier: "{{ contact.hs_predictivescoringtier }}"

This code snippet puts the current page’s email, lead score, number of page views, number of visits, and predictive scoring tier in the dataLayer. That seems kinda handy for some personalization rules, right?

This is also a great way to reconcile your Google Analytics tracking against Hubspot data by using Google Analytics custom dimensions for contact variables.

* Notes: Yes, you do need script tags. Yes, you do need to put double quotes around HubL variables. Yes, you can choose what variables to write to put into the dataLayer. Keep reading…

HubL and Hubspot Variables

Hubspot uses its own templating language they call HubL. (Hubspot Language, get it!?) HubL is based on a templating language called Jinja which, conveniently, is also used by Drift. HubL provides access to any Hubspot variables that you have about a contact. For example, the code, {{ }} would print the current user’s (aka. contact’s) email to the page. I am only going to say it once: Be careful with personally identifiable or sensitive data.

You can initialize your dataLayer with all kinds of data about the Contact, the Content, the page Request, Blog posts, and even your Hubspot account if you wanted to. Check out Hubspots docs for a pretty comprehensive list of Hubspot variables.

If you have custom contact variables, you can find those names here:<yourhubspotAppID> Use the “Internal Name”. Eg. {{ contact.internal_name }}

Be smart about the variable names that you use. You can choose variable names that don’t seem obvious if the user was to “view source.” And you can use Jinja to modify the variables (maybe to something a bit more obvious or anonymous) before printing them to the page.

Things that make you go WTF?

My biggest annoyance was that Hubspot does not seem to expose the Hubspot user ID in HubL. That would be very handy for a user ID in Google Analytics. You could use the Hubspot “hubspotutk” cookie for that purpose though but that isn’t quite as good. Other than that, everything is pretty clean. Just remember to use double-quotes and follow the rest of the * Notes from earlier.

I hope this helps you create a better experience for your website users. Go on and personalize! Just don’t be creepy!


Dynamic Gantt Charts in Google Sheets + Project Timeline Template

Updated: August 2018

Google Docs and Gantt charts are a perfect match. Google Spreadsheets offers the ability to share and update spreadsheets in real-time which is a major benefit for any project team- especial those who work in different locations or time zones. On top of that, you can’t beat the free price!

There are many projects that are complex enough to demand a formal task planning and management hub but do not justify a full-featured, premium application. This tutorial will show you how to take your ordinary task list and turn it into a dynamic visual timeline — a Google Spreadsheet Gantt chart.

Google Spreadsheet Gantt Chart

View the Sample Chart with Formatting Examples

View a Comprehensive Template From a Reader

There are other Google Spreadsheet Gantt chart examples that use the Chart feature as the visualization. I like to use the SPARKLINE() function. This keeps the project task visualization in the same place as all the important details about each task such as the RACI assignments or progress updates.

Dynamic Sparklines Work Better Than Charts

Sparklines are essentially just little data visualizations in spreadsheet cells. To learn more about how the sparkline feature works, check out these sparkline examples. To create the visualization, we are going to use “bar” for the value of “charttype.” Then we get a little bit clever with colors to show the start and end dates of each task. The SPARKLINE formula for each task visual looks like this:

=SPARKLINE({INT(taskStart)-INT(projectStart), INT(taskFinish)-INT(projectFinish)},{"charttype","bar";"color1","white";"empty","zero"; "max",INT(projectFinish)-INT(projectStart)})

The projectStart and projectFinish values are the start and end date of the project, and the taskStart, and taskFinish values are the start and end dates for the task that is being shown in the timeline visualization.


The reason everything is being wrapped in the INT() function is so that the dates can be subtracted from each other to provide the difference in days. The first argument to SPARKLINE puts two values in the array literal that are essentially:

{daysSinceProjectStartUntilTaskStart, daysSinceProjectStartUntilTaskFinish}

The SPARKLINE function then makes two bars, one which is colored "white", as to be invisible and the other which is colored blue (by default) or any color you choose by setting "color2". The value for "max" is the difference between the start and end of the project in days.

On the example template, there are a couple other features: a week-by-week ruler and the burndown visualization.

The week-by-week visualization uses a clever little formula to make an array of number incrementing by seven as the first argument to SPARKLINE to display alternating colored bars for each week of the project’s duration.


The burn down visualization shows the days that have been burned through the project. This gives you a visual display of how well the project is keeping on track to its timeline. The first argument to SPARKLINE  is a dynamic value, calculated by subtracting the project’s start date from the current date:


Customizing your Timelines

Each SPARKLINE function takes arguments for color1 and color2. These values set the color of the alternating bars in the bar visualization. For each task, color1 is set to white so to be invisible. But color2 can be set to anything that may be useful for managing your project. Colors could be specified by task owner or type, or even by dynamically set based on if they are ahead of schedule, in progress, late, etc…

Keep this in your Google Docs project folder with all of your other important project documentation for a neat project hub.

Simplifying Your Spreadsheet Formulas

The SPARKLINE functions, especially if you use a lot of conditional coloring, have a tendency to become really long and hard to maintain. To make it easier to update and maintain, I suggest using named ranges. Named ranges allow you to designate a few cells that hold color variables that you can refer to by name.

For example, if you wanted all the future tasks to be the color blue then in cell C2, you could input the text “blue.” Then you could name that cell futureColor and everytime you needed to reference the “future color,” use futureColor in the formula. Then you don’t have to think about cell references and you only have to update one cell to update several sparklines. This also works for project start and end dates and fixed miles stones. It does not work for variable dates like task start and end dates, those should be cell references.

What is keyword stemming? Use it … Wisely

 Keyword stemming is the process of removing all modifiers of a keyword including prefix, suffix, and pluralization until only the root of the word remains. For example, “consulting” and “consultant” would be stemmed to the same root, “consult.”  This technique is useful for SEO keyword research because it helps to identify and group similar keywords.

Keyword Stemming Example

In the example above, a piece of content that advertises an Agile Development Consultant could target searches for both “agile consultant” and “agile consulting.” It is important to understand how searchers construct their search queries so that you create content that attracts clicks from all possible long tail search variants.

Keyword Clarity is a 100% Free Keyword Grouping Tool that allows you to stem your keywords and group their associated metrics.


Why Consider Keyword Variants

With long tail searches about topics outside of Google’s knowledge graph, exact match keyword usage still seems to matter—if not for ranking, then at least for click-through rate because exact match keywords rendered in bold in search result listings.

Sometimes one variant of a target keyword might have a higher search volume but another might have a better click-through rate in search results. You can test different variations of a page title to find the variant that provides the best click-through rate.


The Problems with Keyword Stemming

It is important to note that while keyword stemming is great for grouping similar keywords, it can also be great at grouping words that aren’t similar. For example, “popular” and “population” both have the same root but do not have similar meanings. It is important to consider search context and intent when stemming keywords into groups.


Keyword Stemming Algorithm

Keyword Clarity uses the Porter Stemmer algorithm. The Porter stemmer algorithm is a rule-based algorithm that progressively stems words in a sequence of five steps. The first step removes plural forms of words. Later steps further modify the words if they meet the relevant criteria.

For example, the word “revivals” would get modified to “revival” in the first step. The fifth step would stem it into its final stemmed form, “reviv.”


Keyword Stemming for Fun and Profit

To see how keyword stemming works in action, try out Keyword Clarity. You can collect automatically import keywords from Google Search Console or paste them in from any source. Happy stemming!


SEO with the Google Search Console API and Python

The thing I enjoy most about SEO is thinking at scale. Postmates is fun because sometimes its more appropriate to size opportunities on a logarithmic scale than a linear one.

But there is a challenge that comes along with that: opportunities scale logarithmically, but I don’t really scale… at all. That’s where scripting comes in.

SQL, Bash, Javascript, and Python regularly come in handy to identify opportunities and solve problems. This example demonstrates how scripting can be used in digital marketing to solve the challenges of having a lot of potentially useful data.

Visualize your Google Search Console data for free with Keyword Clarity. Import your keywords with one click and find patterns with interactive visualizations.

Scaling SEO with the Google Search Console API

Most, if not all, big ecommerce and marketplace sites are backed by databases. And the bigger these places are, the more likely they are to have multiple stakeholders managing and altering data in the database. From website users to customer support, to engineers, there several ways that database records can change. As a result, the site’s content grows, changes, and sometimes disappears.

It’s very important to know when these changes occur and what effect the changes will have on search engine crawling, indexing and results. Log files can come in handy but the Google Search Console is a pretty reliable source of truth for what Google sees and acknowledges on your site.

Getting Started

This guide will help you start working with the Google Search Console API, specifically with the Crawl Errors report but the script could easily be modified to query Google Search performance data or interact with sitemaps in GSC.

Want to learn about how APIs work? See: What is an API?

To get started, clone the Github Repository: and follow the “Getting Started” steps on the README page. If you are unfamiliar with Github, don’t worry. This is an easy project to get you started.

Make sure you have the following:

Now for the fun stuff!

Connecting to the API

This script uses a slightly different method to connect to the API. Instead of using the Client ID and Client Secret directly in the code. The Google API auth flow accesses these variables from the client_secret.json file. This way you don’t have to modify the file at all, as long as the client_secret.json file is in the /config folder.

    credentials = pickle.load(open("config/credentials.pickle", "rb"))
except (OSError, IOError) as e:
    flow = InstalledAppFlow.from_client_secrets_file('client_secret.json', scopes=OAUTH_SCOPE)
    credentials = flow.run_console()
    pickle.dump(credentials, open("config/credentials.pickle", "wb"))

webmasters_service = build('webmasters', 'v3', credentials=credentials)

For convenience, the script saves the credentials to the project folder as a pickle file. Storing the credentials this way means you only have to go through the Web authorization flow the first time you run the script. After that, the script will use the stored and “pickled” credentials.

Querying Google Search Console with Python

The auth flow builds the “webmasters_service” object which allows you to make authenticated API calls to the Google Search Console API. This is where Google documentation kinda sucks… I’m glad you came here.

The script’s webmasters_service object has several methods. Each one relates to one of the five ways you can query the API. The methods all correspond to verb methods (italicized below) that indicate how you would like to interact with or query the API.

The script currently uses the “webmaster_service.urlcrawlerrorssamples().list()” method to find how many crawled URLs had given type of error.

gsc_data = webmasters_service.urlcrawlerrorssamples().list(siteUrl=SITE_URL, category=ERROR_CATEGORY, platform='web').execute()

It can then optionally call “webmaster_service.urlcrawlerrorssamples().markAsFixed(…)” to note that the URL error has been acknowledged- removing it from the webmaster reports.

Google Search Console API Methods

There are five ways to interact with the Google Search Console API. Each is listed below as “webmaster_service” because that is the variable name of the object in the script.


This allows you to get details for a single URL and list details for several URLs. You can also programmatically mark URL’s as Fixed with the markAsFixed method. *Note that marking something as fixed only changes the data in Google Search Console. It does not tell Googlebot anything or change crawl behavior.

The resources are represented as follows. As you might imagine, this will help you find the source of broken links and get an understanding of how frequently your site is crawled.

 "pageUrl": "some/page-path",
 "urlDetails": {
 "linkedFromUrls": [""],
 "containingSitemaps": [""]
 "last_crawled": "2018-03-13T02:19:02.000Z",
 "first_detected": "2018-03-09T11:15:15.000Z",
 "responseCode": 404


If you get this data, you will get back the day-by-day data to recreate the chart in the URL Errors report.

Crawl Errors





This is probably what you are most excited about. This allows you to query your search console data with several filters and page through the response data to get way more data than you can get with a CSV export from Google Search Console. Come to think of it, I should have used this for the demo…

The response looks like this with a “row” object for every record depending on you queried your data. In this case, only “device” was used to query the data so there would be three “rows,” each corresponding to one device.

 "rows": [
 "keys": ["device"],
 "clicks": double,
 "impressions": double,
 "ctr": double,
 "position": double
 "responseAggregationType": "auto"


Get, list, add and delete sites from your Google Search Console account. This is perhaps really useful if you are a spammer creating hundreds or thousands of sites that you want to be able to monitor in Google Search Console.


Get, list, submit and delete sitemaps to Google Search Console. If you want to get into fine-grain detail into understanding indexing with your sitemaps, this is the way to add all of your segmented sitemaps. The response will look like this:

   "path": "",
   "lastSubmitted": "2018-03-04T12:51:01.049Z",
   "isPending": false,
   "isSitemapsIndex": true,
   "lastDownloaded": "2018-03-20T13:17:28.643Z",
   "warnings": "1",
   "errors": "0",
  "contents": [
    "type": "web",
    "submitted": "62"    "indexed": "59"

Modifying the Python Script

You might want to change the Search Console Query or do something with response data. The query is in and you can change the code to iterate through any query. The check method is used to “operate” on every response resource. It can do things that are a lot more interesting than printing response codes.

Query all the Things!

I hope this helps you move forward with your API usage, python scripting, and Search Engine Optimization… optimization. Any question? Leave a comment. And don’t forget to tell your friends!


Fetch As Google – From Googlebot’s Perspective

The Google Search Console Fetch and Render Tool is like putting on a pair of Googlebot goggles and looking at your own site through those lenses. If you have a simple website, (WordPress for example) that hasn’t been modified too much, you may not appreciate why this is important. But the more complex your site becomes with AJAX Javascript rendering or robots.txt rules, this tool becomes critical to understanding why your site is, or is not, optimized for crawling— and search in general.

When you ask Google to Fetch and Render a URL, Google makes a few requests to that URL, one to your robots.txt file, and one to your favicon. Some of these request matter more than others in terms of SEO signals. I set up some logging on a simple test site:, to see if there was anything interesting going on, and hopefully to understand more about what these requests might signal about what Google and Googlebot care about in terms of crawling, rendering, and indexing.

The test pages were simple PHP pages with only a single line of content, “You’re invited!” The PHP was used to collect server variables and HTTP headers and send them to me as an email. This and server logs is how I gathered this information. Now let’s dive in!

Fetch and Render

The two side-by-side views of the page are generated from rendering the page with two different user agents that Google calls the Web Rendering Service (WRS).

Google Search Console Fetch as Google Tool

The rendering of the page under the heading, “This is how a visitor to your website would have seen the page” comes from the Google user agent:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Web Preview) Chrome/41.0.2272.118 Safari/537.36

As the user agent string suggests, this is essentially a Linux computer running a Chrome browser at version number 41. This is a little odd since that version was released in March of 2015. But this might be a good hint as to what technologies you can expect Googlebot to reliably recognize and support in discovering the content of your page.

Google sets some additional limits to what you should not expect for the WRS to render, namely: IndexedDB and WebSQL, Service Workers and WebGL. For more detail on what browser technologies are supported by Chrome 41, check out

The rendering in the Fetching tab under the heading, “Downloaded HTTP response” and the Rendering tab under the heading, “This is how Googlebot saw the page” both come from the same request. The user agent is:

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko; Google Search Console) Chrome/41.0.2272.118 Safari/537.36

The user agent string is practically the same as the Google Web Preview user agent. The user agent only differ in name.

The most significant difference between these two requests is that this request sets the Cache-Control header to ‘no-cache’ to ensure that the content at the URL is as fresh as possible. As the RFC states: “This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests.” This makes sense; Google wants to have the freshest index possible. Their ideal index would never be even a second old.

This is further demonstrated in how Google makes requests when you request that a URL is indexed.

Requesting Indexing

Requesting indexing is a great tool when you have new pages on your site that are time sensitive, or your if site/pages go down due to server bugs. It is the fastest way let Googlebot know that everything is up and running. When you “Request Indexing” you are asking Googlebot to crawl the page. This is the only way to do this— submitting sitemaps an implicit request for Googlebot to crawl your pages, but this does not mean that all the URLs will be crawled.

Hello Googlebot

When you click “Request Indexing,” Googlebot request you page not once, but twice. Unlike the previous requests, these requests are from “Googlebot” itself (Mozilla/5.0 (compatible; Googlebot/2.1; + This duplicate request may provide some insight into what Google thinks and cares about when crawling the internet.

The first request is similar to the requests mentioned above. The Cache-Control header is set to no-cache, ensuring that the requested resource is not stale. In my test case, the Accept-Language header was set to the default language of my Google Search Console account, even though I had not specified a default language in for the site. The server is also in the US so this makes sense.

Googlebot Request #1 HTTP Headers

Name Value
Accept-Encoding gzip,deflate,br
User-Agent Mozilla/5.0 (compatible; Googlebot/2.1; +
From googlebot(at)
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Connection close
Cache-Control no-cache
Accept-Language en-US


Seconds later, the second request comes along with two important changes. First, the Cache-Control header is no longer set. This implies that the request will accept a cached resource.

Why does Googlebot care about caching? My belief is that it is to understand if and how the page is being cached. This is important because caching has a big effect on speed— cached pages do not need to be rendered on the server every time they are requests thus avoiding that time on the server wait time. A page that is cached, and how long it is cached is also a signal that the content may not be updated frequently. Google can take this as a sign that they do not need to crawl that as often as a page that changes every second in order to have the freshest version of the page. Think of this like the difference between this blog post and the homepage of Amazon or a stock ticker page.

Googlebot Request #2 HTTP Headers

Name Value
If-Modified-Since Mon, 26 Feb 2018 18:13:14 GMT
Accept-Encoding gzip,deflate,br
User-Agent Mozilla/5.0 (compatible; Googlebot/2.1; +
From googlebot(at)
Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Connection close


The second request, as you might have guessed, does not set a Cache-Control header. This way, Google is able to “diff” the cached and non-cached version of the page to see if or how much they have changed.

The other change between the first and second request is that Googlebot does not set the Accept-Language header which allows the server to respond with the default language. This is likely used to understand if and how the page and site are internationalized.

Perhaps if I had set rel=”alternate” tags the crawl behavior would have been different. I will leave that experiment up to you.

I’ve spent a lot of time with Googlebot and SEO experiments lately. I am just starting to write about them. Sign up for email updates to learn more about Googlebot and leave a comment if you have any questions. Thanks for reading.