Hosting your Static Site with Amazon S3

If you followed the previous post about getting started with Pelican, you should have a Pelican website up and running on your local machine. This is where a lot of web development tutorials stop and that has always frustrated me. There are a million and one places to learn how to code online but just because you learned how to write a for-loop doesn’t mean you can actually make anything.  This post is meant to help bridge the gap between “starting a web project” and starting a web project on the actual web.

The goal of this post is to get your collection of HTML files that you’ve built using  pelican content uploaded to Amazon S3 so that everybody can access your site on the web!

This is the Amazon S3 Console. We’ll get to this soon…

Why host on Amazon S3?

In the last post, I discussed why I chose to build a static site with Pelican and there was a similar set of considerations why I chose to host this project on S3. I will address the main two:

Why not Github Pages?

Github Pages is, of course, awesome. There is no easier way to host a static site on the web and best of all it’s free. So why didn’t I choose the path of least resistance? The answer is configurability and server access logs. GitHub Pages definitely favor simplicity over configurability, and as a result, don’t give you many options for tweaking your site. That’s great for most but not for exploring the technical side of SEO where logs are really important.

Why not Netlify?

Netlify is also very cool. Like GitHub Pages, Netlify allows you to deploy sites from GitHub. It also strikes a good balance between simplicity and configurability—leaning more to the configurable side than GitHub Pages. It also has a host of very cool features, many of which are available for free. If I were just doing this as an ordinary web project, I probably would have chosen Netlify, but because this is meant to introduce more of the bare metal concepts, AWS wins out.

On top of those questions, there are really good reasons to choose AWS in its own right:

  1. It’s huge. So many companies host their sites and apps that it’s worth getting familiar with their concepts and terminology.
  2. It’s huge. AWS has so many services that you can take advantage of. We’re only taking advantage of S3, Route 53, and CloudFront but starting with AWS makes it easy to scale your projects if you want to do something crazy.
  3. It’s huge. The Cloudfront CDN is among the best out there and it allows us to mess around with HTTP headers and send the server access logs to an S3 bucket so they are as easy to access as our site.

On the flip side, AWS (and Amazon) is huge. So that may be a consideration for choosing other hosting solutions or CDNs. There are lots out there to choose from. I’d say just google them, but if you’re not into big corporations, I’d say, go try DuckDuckGo.

Prepare your static site for the cloud

Luckily, I don’t need to reinvent the wheel much here. @Kent put together an excellent technically-focused walkthrough. The only difference between this tutorial and his is that we are going to use Route 53 instead of a Cloudflare.

Up to this point, you should have a /output directory that has a handful of HTML files after running  pelican content. You could put an article in your /content directory to generate your first article. For instruction on adding an article,  refer to the Pelican articles docs.

That said, you don’t need any articles yet to keep pushing forward to get your site live.

Setting up relative URLs in your pelicanconf.py file

You may have already modified your pelicanconf.py file in order to apply your Pelican theme. If you haven’t it’s time to make your first modification (and a very important one).

What is a relative URL? you might ask. That’s a good question and an important one when it comes to hosting your site on the web. Relative URLs are URLs that start with the URL path rather than the protocol (https://) and the hostname (www.yoursite.com). In other words, they look like this: /my-folder/some-page/.

Why is that important? We are going to move all the site’s pages your computer, (http://localhost/) to an S3 bucket with a URL like yoursite.com.s3-website.us-east-2.amazonaws.com/, and in the future to your chosen domain name (something like yoursite.com). If the site refers to internal pages by relative, instead of absolute URLs, you won’t have to worry about every internal link breaking every time you change your hostname. (This is also really important when it comes to domain migrations!)

 # Uncomment following line if you want document-relative URLs when developing
 # RELATIVE_URLS = True

By default, Pelican will prefix all your URLs with http://localhost:8000 when you’re building your site locally. In order to change this to relative URLs, there is an easy switch in your pelicanconf.py file. All you have to do is find these lines and uncomment the line that says  RELATIVE_URLS = True.

Setting up your Amazon S3 Bucket

S3 stands for “Simple Storage Service.” The concept is quite simple, S3 provides you with “buckets” to host your content on their servers. It’s great: there’s no need to configure or manage a server.  Buckets can be public or private but for our purposes, we’ll have a public-facing bucket. 

Uploading a site to an S3 bucket is pretty simple but first, let’s set up an S3 bucket. You will, of course, need to set up an AWS account. If you don’t have an account yet, you’re in luck. You can start up your account for free for one year!

Creating your first S3 Bucket

Once you have an AWS account, follow these steps to create a bucket.

  • Go to your S3 console: https://console.aws.amazon.com/s3
  • Click the “Create bucket” button
  • Name your bucket after the domain name you plan to use. My project is going to live at techdefs.com, so I named my bucket “techdefs.com”.  

If you haven’t decided on a domain name, now’s a good time to head over to Amazon Route 53 and find one that’s available.

  • Select a Region where you would like your content to be hosted… You could choose one near where your audience is likely to be, or you could choose based on price. I chose US East (Ohio) but to be honest, for this project, it doesn’t really matter.
  • Un-select “Block all public access”.This will allow your website content to be accessed publicly on the web.
  • Click “Create Bucket” at the bottom of your screen to finalize your bucket.

Configure your S3 Bucket to host your site

  • Open the bucket Properties pane
  • Choose “Static Website Hosting
  • Choose “Use this bucket to host a website”
  • Name of your index document in the Index Document box. For Pelican sites, the index (aka homepage) document is index.html.
  • Click Save to save the website configuration.
  • Copy your Endpoint URL and paste it somewhere for later use

Configure public access for your S3 Bucket

The last step will be to set the bucket’s security policy to allow public access (so everyone can view it on the web). Bucket policies determine who can access your buckets and what level of permissions they can have. For example, you might only want to grant view access to some people and write access to others. Read more about s3 bucket policies here.

For our purposes, we are going to allow “PublicReadForGetBucketObjects” for the objects (HTML files)  in the bucket that hosts the site. See step #3 below for more details.

Go to your new bucket and go to the Permissions tab

You should see “Block all public access“ is set to “Off”

  • Click Choose Bucket Policy
  • Paste in the following policy and replace yoursitename.com with you actual site’s name
 {
    "Version":"2012-10-17",
    "Statement":[{
   "Sid":"PublicReadForGetBucketObjects",
          "Effect":"Allow",
     "Principal": "*",
        "Action":["s3:GetObject"],
        "Resource":["arn:aws:s3:::yoursitename.com/*"
        ]
      }
    ]
  } 
  • Click Save

Congrats, you have your first S3 bucket! Now let’s fill it with your website!

Upload your site to your S3 Bucket

There are two ways to do this. One is easy but manual, and the other takes a bit more time to set up but automates the process for the future. Since we have spent most of the time on this post “setting things up,” first we are going to do it the easy way first so we can see the site live on the web!

  • Go to the Overview tab
  • Click Upload
  • Open up a Finder window (assuming your using Mac) and navigate into your /output folder
  • Drag and drop all the files in /output into the Upload box (not the /output folder itself)
  • Click Next to proceed 
  • Set “Manage public permissions” to “Grant public read access to this object(s)” and click Next 
  • Leave the Storage Class  set to Standard and on the “Set Properties” step and Click Next
  •  Click Upload on the Review step
  • Your files will be uploaded to S3 shortly

If everything has gone well up to this point, your site is ready to view! 

Paste your Endpoint URL from step #6 of the “Configure your S3 Bucket to host your site” section above.

🎉🎉🎉 Your site is (hopefully) live! 🎉🎉🎉

Troubleshooting

Maybe it didn’t work perfectly… That makes a good opportunity to learn!

If your homepage looks like CSS rules weren’t applied

View the HTML source of your page and check if the CSS links look right. (They might be pointed to localhost). Go back to the “Setting up relative URLs in your pelicanconf.py file” step and check that everything looks good.

If your homepage displays an XML error saying that you don’t have permissions

This is probably because you missed a step setting up public permissions. Recheck the “Configure public access for your S3 Bucket” step and the “Upload your site to your S3 Bucket“ step to ensure that your site has public permissions.

Moving on

You might be satisfied here… I mean you have a website up and hosted. As you add articles, you can upload them to S3 as you’ve done before. But… you don’t have a cool URL and the deployment process is still pretty manual. And we haven’t even done any fun SEO stuff.

In the next posts, we’ll set up a custom domain with Route 53 and set up an automated s3 upload process. Then, we’ll do some interesting SEO stuff.

Starting an SEO Project with Python, Pelican, and AWS

I’ve been thinking about the best way to put a “course” together to demonstrate the overlap between web development and SEO. There are a lot of directions this could go in but I wanted to strike the right balance between technical depth and feasibility for someone who hasn’t done much in the way of web development. 

This is the beginning of this so-called “course,” though it’s more of a guided SEO project. Though this is just the beginning, I hope to teach something about technical SEO and SEO analytics by playing around with website code and hosting infrastructure and different measurement tools. Hopefully, you’re interested in Python too 🙂

Launching an web development and SEO project

To start the project in the right direction, I had to determine what technologies to use that balance of technical complexity and ease.

I had some considerations:

  1. Should I choose WordPress? Sure, it’s popular, but there are already tons of WordPress tutorials out there but the last thing I want to do is tell people they should go out and learn PHP and tear at the internals of a 15+ year-old web framework. 
  2. Python continues to grow in popularity. And that’s awesome, but I feared that, if this project were dependent on the audience’s ability to  deploy a Flask or Django site, it would steer the focus away from SEO toward web development,
  3. What about Jekyll? A static site generator seemed like a good balance between simplicity and technical depth.  (They are also really affordable to maintain!) Jekyll seemed like a good option but I opted against it because it’s built on Ruby. And Ruby, like PHP, just isn’t as hot as Python these days. 

This meant the focus would be a static site generator based on Python. This made Pelican an easy choice. Pelican has been around long enough and garnered enough support to have a decent ecosystem and plenty of well-written “Hello World”  tutorials. 

How Static Site Generators Work

Static site generators are a good balance between the power of a full-blown CMS and the simplicity of a pure HTML site. With a static site generator like Pelican, instead of worrying about hosting an application and a database, you only have to manage “flat files” and host the fully-rendered HTML pages on a server or file store like S3. 

Most static site generators, work like this:

  1. Choose or develop a theme that determines the style and layout of your pages
  2. Edit your site’s content in markdown files locally on your computer
  3. Run a command to build all the HTML files that make up your static site
  4. Transfer these pages from your computer to the hosting service of your choice
  5. Your website is up and running!

This means this project can be more about demonstrating SEO levers than web development. 

Introducing Pelican: A Static Site Generator, Powered by Python

Pelican is conceptually pretty simple. At a high level, you can think of it like this: 

  1. Your content: The “flat files,” commonly markdown or reStructuredText files, that you write and update to generate content on your site
  2. Pelican configurations: The settings in pelicanconf.py and publishconf.py that Pelican refers to when building your site
  3. Pelican itself: The processes that reads your content and configurations and generate the HTML files that make up the complete static site
  4. Pelican themes: Either pre-built or custom-build, themes make up the page templates (based on Jinja), CSS, and Javascript files that create the look and feel of your site
  5. Pelican plugins: Add-ons that allow you to change how Pelican reads your content, outputs your site, or does just about anything else during the build process

That means if you want to modify your site, you basically have one of two avenues: modify your themes or add/build plugins. 

That’s really nice compared to WordPress, where you would have to think about a MySQL database schema, WordPress’ architecture, and… writing PHP to generate pages. This isn’t the answer for the next Yelp, but it will let you do some interesting things with SEO for .0000001% of the complexity!

Getting Your Web Project Started

With any web project, there is some groundwork to be done before the real construction begins. If you’ve done any kind of development project before, you’ll find most of this looks pretty familiar. 

If you haven’t done any development project, I recognize that getting started can be the most challenging part. Hopefully, these resources should be sufficient for you to get started with enough coffee, grit, and free time.

Setup your Development Environment

If you’re unfamiliar with a development environment is more of a state of preparation than an actual “thing.” A development environment means having all your necessary software, packages, settings, and tools loaded, working correctly, and understood generally understood.

If you want some guidance with this step, I suggest Peter Kazarinoffs’ guide to setting up a development environment for Pelican. In his first post he covers:

  • Installing Python 3
  • Setting up a virtual environment (to keep this project’s packages separate from other Python packages on your computer)
  • Installing Pelican and other required Python packages
  • Creating a GitHub.com account
  • Making a directory for the site and linking it to GitHub

That’s a great guide to follow if this is your first Python development project ever (and even if it’s not).

Getting started with Github the easy way

This project assumes that you’ve never worked with git or GitHub before. For that reason, the demonstrations will use Github Desktop and the Atom code editor because they take a lot of the complexity out of git. So when it comes to the “Making a directory for the site and linking it to GitHub” step above, I’d suggest following this video about creating a repository with Github Desktop

Getting your site running locally

At this point, you can find numerous tutorials to get your site up and running on your “local machine” (aka your computer). I think Matthew Devaney’s tutorial series is pretty easy to follow. To get your site running locally follow his tutorial about installing Pelican, choosing a theme, and running it locally. You can also continue ‘s tutorial for guidance on using a virtual environment and build automation

You’ve completed this project step once you have done the following:

  • Installed Python and set up a virtual environment
  • Installed Pelican and the required Python packages
  • Created a GitHub account
  • Created a directory for your Pelican project
  • Linked your project site to Github
  • Setup Pelican via pelican-quickstart
  • Successfully ran pelican content to generate your pages locally
  • Opened your bare-bones site in your browser after running pelican --listen

If you’ve made it this far, congrats! You’ve made it past one of the hardest parts of this whole project.

Up next: Play around with Pelican and see if you can generate some posts. Our next step will be to host it live on Amazon S3!

Beginner’s Guide to Content Management Systems and Templating Engines

If you’re new to web development—especially if you are coming from adjacent territory like marketing or product management, you’ve probably begun to understand the basics of variables and for loops but there’s a big gap between where you are and how things get done in the wild. The goal of this post is to introduce you to the wild world of content management and tame it at the same time. I hope you enjoy!

Redefining the Content Management System Category

The term “Content Management System” describes an actively expanding category of software that can be used to manage the creation and modification of digital content. You may think that definition seems broad but that’s because the category is really broad! In fact, **Wikipedia doesn’t even include serving content as a requirement for this definition!

On top of the content creation-and-modification functionality, most CMS’s provide a much wider range of offerings. This starts, of course, with actually serving the content as HTML pages (rather than just text or image files) and includes common services like handling ecommerce transactions, managing user credentials and web analytics.

Here are a few members of the Venn diagram that is the CMS category:

  • WordPress: The most popular CMS started as a blog platform but now is capable of supporting any type of app from ecommerce shops to user-review websites.
  • Shopify: The popular ecommerce platform is essentially a CMS that focuses on managing product content and handling monetary transactions.
  • Dropbox: You might not consider the **digital asset management software **a CMS, the app allows you to upload images and serve them publicly on the web.
  • Hubspot: The customer relationship management (CRM) system, also offers a blog and landing page CMS with the personalization benefits that only a CRM could.
  • Ghost: One of the most popular in a category of “headless CMS,” Ghost serves content via an API which is perfect Javascript-based single-page apps (SPAs).
  • Webflow: A codeless CMS, it affords 80% of the content and design capabilities of a code-based CMS like WordPress, without needing to write custom code to use them.
  • Pelican: On the far corner of the CMS world is this static site generator written in Python. Pelican simply translates a collection of text files (Markdown is commonly used) into a website that can be hosted on services like.

From headless to full-stack, codeless to code-only, feature-rich to barely-a-CMS, you get the idea: the space is huge.

There is a CMS for almost any use case: If you don’t want to use a database, use a static site generator like Pelican or Jekyll. If you want to build a front end but don’t want to worry about the back end, use a headless CMS. If you want to use a CMS with a ton of community support and familiar developers, use WordPress. The list goes on.

No matter your use case, there are some general principles that apply to most CMS that are good to understand. That’s what we’ll get into next.

Beyond Content: Themes and Templates

If you are working on SEO but aren’t really familiar with web development, you might have heard some of these terms like “theme,” “templating,” and “rendering,” and wonder what they’re all about. Let’s fix that.

CMS Themes

Most CMS, including WordPress, Drupal, and Shopify employ a concept of themes (even if they call them something else). Themes are the CSS, Javascript, and HTML template files that package CMS content into a complete website experience.

The combination of these files in a browser creates the “look-and-feel” of a website: CSS files define visual characteristics like background colors, typography, and iconography while JS files create animations and interactivity. HTML templates determine the layout of the content on a page whether that’s in a phone, tablet, or computer screen.

**Themes offer one massive benefit: separation of concerns. **This means that the content is separated from the presentation of the content. And while you might take that for granted, this is what makes it so easy to modify the layout of all the pages on a site without having to change the HTML of each page. (This was a game-changer in the early Web 2.0 days!)

For example, if you want to change a WordPress theme or Pelican theme, all you have to do is add the theme files to your project and change some site settings and voilà, your site has a hot new look!

HTML Templating

At the core of every theme is the way that content is transformed and merged into HTML to create a webpage. This is called templating. Typically template files look like regular HTML files except, instead of containing the page’s content within the HTML tags, there are variable placeholders and other “templating logic” that will be replaced with the page’s content when the page is “rendered” by the CMS code.

How templating engines work

It really helps to see an example. Below is a very minimal HTML page with variables (denoted by the {{variable}} syntax) embedded in the HTML.

<!DOCTYPE html>
<html lang="en">
<head>
    <title>{{ article.title }} | {{site.name}}</title>
    <meta name="description" content="{{ article.description }}">
</head>
<body>
    <h1>{{ article.title }}</h1>
    <div class="byline">by {{article.author}} </div>
    <div class="content">{{ article.content }}</div>
</body>
</html>

View Gist on Github

After the variables are replaced with the page’s content, the output HTML file would look something like this:

<!DOCTYPE html>
<html lang="en">
<head>
    <title> My First Time Skydiving! | My Life Story</title>
    <meta name="description" content="I went skydiving for the first time in my life on my birthday. Check out the story and all the cool pictures.">
</head>
<body>
    <h1>My First Time Skydiving!</h1>
    <div class="byline">by Trevor Fox </div>
    <div class="content">On my 30th birthday I… </div>
</body>
</html>

View Gist on Github

Let’s step way back for a moment and discuss what this all means. Consider how much easier it is to scale up a website with a lot of content. Without templating, the best way to create new pages would be to copy a similar existing HTML file and replace the copied pages with new content. It sounds like a nightmare! And what if you wanted to change something about a page you copied from?? Yeah… you get the idea.

This generic approach to generating HTML files was a game-changer. Now nearly every single site on the web employs the concept of templating and it’s a foundational piece of SEO.

Let’s consider the popular marketplace website, craigslist.com. The site has millions (billions?) of live pages at any moment but it likely has fewer than 100 different templates. There are pages for every geographic locale that list all the categories for that locale, there are several types of category pages that list all the event, service, and for-sale item listing in that locale, and there are pages with all the details for each event, service, or item in that category. The pages add up quickly, but thanks to templates, the site’s content scales beautifully.

This is a great example for SEO too. Consider for a moment that all of Craigslist’s category pages are just templates with content from other pages combined dynamically to generate content for these new pages. It’s like every time they have content for a few listing pages they get a category page for free—and all the organic traffic that comes with it.

Templating Engines

Most CMSs employ a templating engine (or you might hear them called “templating languages”) on top of whatever programming language the CMS was built on. templating engines make it easy to render HTML files from template files. Here are a few examples and the CMS’ and web frameworks that use them.

  • Liquid: Ruby’s most popular templating engine is used by Shopify, JekyllZendesk, and many others
  • Jinja2: Python’s most popular templating engine is used by Pelican and can be used with the web frameworks Flask and Django and others
  • HubL: Hubspot’s proprietary templating engine for web and email is an extension of on Jinja offering more Hubspot-specific functionality
  • Handlebars: The Javascript templating engine is used by Ghost, Enduro.js CMS, and web frameworks such as Ember.js and Metor.js.
  • PHP?: Ok, PHP is a scripting language but it’s still worth mentioning here because the language was built for constructing web pages. As part of the language’s design, you write logic into .php files that output content into the HTML.

This is only a small sample of templating engines. Most popular programming languages have several popular options but all of these templating engines have a few things in common: they assemble and render HTML files from templates that look like HTML but can contain variables, conditional logic, and other inserted or inherited templates.

In the future, I hope to break down how website templating works, especially within the context of SEO.

Defining Technical SEO for Non-Techincal SEO’s

Whether you’re just an SEO rookie or you’ve been playing along for years, you probably know that SEO is all about keywords, content, and links.

To “do SEO,” all you need is a website, some content, and you need to figure out how to get some links so your site can rank. The more you have of each, the more successful you’ll be (as long as you don’t do anything stupid.)

That’s it… Or is it?

While keyword sleuthing, golden content, and lots of trustworthy links will lead to some SEO success, that will only take you so far. There is another fundamental layer of SEO that is less well-known and too often misunderstood: technical SEO.

What is Technical SEO?

Technical SEO is the practice of optimizing a website’s code structure, navigational architecture, and ability to be fetched and rendered by browsers and bots. All this impacts how search engines crawl, index, rank, and display a site’s content. 

That was a mouthful! In other words, technical SEO is the process of designing, building, and maintaining websites so their content will attract as much search traffic as possible.

Unlike keywords, content, and links; great technical optimizations on their own won’t attract traffic. Instead, they act as a multiplier for a website’s SEO. And best of all: most types of technical optimizations have a site-wide effect. Unlike optimizing the content of a page, the results are multiplied.

Technical SEO is kind of like the bass line in a good song. It brings everything together and makes each part better at the same time.

The goal of this post is to introduce this concept for further posts that will cover the topic in more depth.

The next post will discuss content management systems because it requires relatively less technical knowledge than other aspects and because CMSs are the bridge between content and code—a perfect entryway into the technical side of SEO.

Intro to SQL User-Defined Functions: A Redshift UDF Tutorial

As a data analyst, your credibility is as valuable as your analytical skills. And to maintain your credibility, it’s important to be able to answer questions correctly and consistently. That’s why you must be careful to integrate reproducibility into your SQL analyses. This tutorial is going to show you how you can use Redshift User Defined Functions (UDFs) to do just that.

Reproducibility in SQL Analysis

I’ve learned that there are two broad factors to reproducibility. The first is the data—different data for the same analysis is going to produce different results. A good example would be a court case: if you ask two witnesses the same question, each one will probably tell you something similar but likely slightly different. 

The second factor is the analytical methods. If we use the court case example again, this would be like the prosecution and the defense asking a witness the same question in two different ways. The lawyers would do this with the intent to get two different answers.

This post is more concerned with the second factor of reproducibility, the analytical method. Whenever you have to write complex SQL queries to get an answer, your analytical method (the SQL query) becomes a big variable. SQL is iterative by nature! Think about it, just be adding and removing “WHEN” conditions, you’re liable to drastically change your results. 

As you iterate on a numerical calculation or classification in a CASE expression you are likely to change your query results. And what happens when you have to perform the same analysis weeks later? You better hope you use the same iteration of your SQL query the second time as the first! 

And that is exactly where User-Defined Functions become so valuable! 

User-Defined Functions (UDFs) are simply a way of saving one or more calculations or expressions with a name so that you can refer to it as a SQL function for further use.

What are User Defined Functions?

User-Defined Functions can be used just like any other function in SQL like SUBSTRING or ROUND except you get to define what the output of the function is, given the input.

User-Defined Functions (UDFs) are simply a way of saving one or more calculations or expressions with a name so that you can refer to it as a SQL function for further use.

They are a great way to simplify your SQL queries and make them more reproducible at the same time. You can basically take several lines of code that produce one value from your SELECT statement, give it a name, and keep it for future use. Using UDFs, you can ensure that, given the same data, your calculations will always produce the same result.

UDF Functions are Scalar Functions. What does scalar mean?

As you learn about UDFs, you’ll see references to the word “scalar.” Scalar just means that the function is defined with one or more parameters and returns a single result. Just like the ROUND function has one parameter (the number) and an optional second parameter (the number of decimal places for rounding) and returns the rounded number. The function is applied to every value in a column, but it only returns one value for each row in that column.

A Hello World! SQL UDF Example

If you are familiar with any kind of programming language, this should be pretty simple. The CREATE FUNCTION syntax only requires a function name and a return data type. That’s it. 

A function called hello_world that returns ‘HELLO WORLD!’ every time would look like this:

create function hello_world ( )
  returns varchar
stable
as $$
  select 'HELLO WORLD!'
$$ language sql; 

In that case, the input data type and the output data type are both varchar because “HELLO WORLD!” is a text output. You could use your function like this:

select hello_world() as my_first_function;

And you’d get an output that looks like this:

my_first_function
HELLO WORLD!

But that wouldn’t be very interesting. You’ll generally want to modify the input(s) of your functions. Let’s take apart a more interesting UDF example.

How to Write SQL UDF Functions

This example function, called url_category takes a varchar as an input (a URL) and returns a varchar output (the category of the URL). To do this, the function compares the input (shown as $1 because it is the first parameter) to the conditions of a case expression.

You could also write this function with two parameters. Here’s an example if you were using Google Analytics data. You could take in the parameters, hostname and a page_path to get more granular with your URL categorization.

SQL UDF Functions with Multiple Arguments

This is Redshift’s example from their docs. It takes two parameters (both specified as float) and returns the value that is greater of the two.

create function f_sql_greater (float, float)
  returns float
stable
as $$
  select case when $1 > $2 then $1
    else $2
  end
$$ language sql;  

To refer to the different parameters in the function, you just use the dollar sign ($) and the order of the parameter in the function definition. As long as you follow that convention, you could go wild with your input parameters!

Redshift UDF Limitations

UDFs are basically restricted to anything that you can normally do inside a SELECT clause. The only exception would be subqueries—you cannot use subqueries in a UDF. This means you’re limited to constant or literal values, compound expressions, comparison conditions, CASE expressions, and any other scalar function. But that’s quite a lot! 

Common UDF Errors and their Causes

Once you start writing UDFs, you’ll find that it’s pretty easy going but there are two especially common “gotchas” 

ERROR:  return type mismatch in function declared to return {data type}

DETAIL:  Actual return type is {data type}.

This just means that you’ve created a function where the output value has a different data type than you said it would. Check that the return data type that you specified is the same as the function is actually returning. This can be tricky if your function is using a CASE expression because a CASE could accidentally return two different data types.

ERROR:  The select expression can not have subqueries.

CONTEXT:  Create SQL function “try_this” body

This means you tried to write a SELECT statement in your function that includes a subquery. You can’t do that.

ERROR:  function function_name({data type}) does not exist

HINT:  No function matches the given name and argument types. You may need to add explicit type casts.

There is one especially odd thing about Redshift UDFs. You can have several functions with the same name as long as they take different arguments or argument types. This can get confusing. The error here means that you’ve called a function with the wrong type of argument. Check the input data type of your function and make sure it’s the same as you input data.

Scaling your SQL Analysis with Confidence!

User-Defined Functions make it really easy to repeat your analytical method across team members and across time. All you have to do is define a function once and let everyone know that they can use it. On top of that, if you want to change the logic of your function you only have to do it in one place and then that logic will be changed for each user in every workbench, notebook, or dashboard!

Take advantage of this clever tool. Your team will thank you, and you will thank you later!

Getting Started with SQL for Marketing (with Facebook Ads Example)

As a digital marketer, I use SQL every single day. And looking back on my career so far, it would be fair (though a bit reductive) to say that I could define my career by two distinct periods: before I learned SQL and after I learned SQL. The two periods are distinct for three main reasons: 

  1. After learnings SQL I am faster at gaining insight from data 
  2. After learnings SQL I am able to make decisions based on more data
  3. As a result, I’ve been making better marketing decisions—and I have seen the traffic, conversion rates, and ROI to prove it. (Thanks to SQL)

If you’re at a crossroads in your career and you find yourself asking, “what coding language should I learn,” here is my case for SQL.

What is SQL (for Digital Marketing)

When you see SQL you might think it means “Sales Qualified Lead” but more commonly, SQL stands for “Structured Query Language.” It is a programming language that allows you to retrieve (or update, alter or delete) data from relational databases. (Relational is just a fancy word for a database that stores data in tables.) 

It’s kind of like ordering from McDonald’s. SQL is a language – a specific set of instructions – that you use to specify the results you want, the way you want them, in the quantity you want. Basically, SQL allows you to have your data your way.

How is SQL Used in Business

SQL has two main uses: applications and analysis. Applications (apps) from CandyCrush to Instagram store content and data about users in databases and then use it to create an experience (like keep track of how many comments you have on an Instagram post). On the other hand, you can use SQL for analysis in the same way you can sort, filter, and pivot data in Excel. (except with a lot more data)

SQL is different from most programming languages like Javascript, Python, and PHP because it only has one use: retrieving data from relational databases. So you can’t use SQL to build a website or a chatbot but you can use programming languages like Javascript, Python, and PHP to send SQL commands to databases and do something interesting with the results. WordPress is a good example of this. WordPress is written in PHP and the PHP code sends the SQL commands to a MySQL database and formats the data into blog articles and article lists.

What’s the difference between SQL and Excel?

Remember when you learned your first Excel formula? Pivot tables? VLOOKUP? You probably through you could take on the world! SQL is like that times 100. SQL and Excel are similar because they both allow you to analyze, manipulate, and make calculations, and join data in tables. 

The biggest difference between Excel and SQL is that you can analyze exponentially more data exponentially faster with SQL but you can’t update the data in SQL quite as easily. Also, SQL commands define how you want your data table to look when the data is retrieved so you are working with entire tables rather than individual cells. The benefit of this is that you don’t have to worry about making mistakes when copying formulas (and the analysis errors that come with that.) On the whole, I’d say SQL is much better than Excel, most of the time.

SQL Example in Marketing

This example shows an ROI analysis using SQL code that you would use to calculate how many customers you’ve acquired per country since the beginning of 2020, and the Facebook Ads spend that was spent in that country. 

SELECT country, sum(customer_count) total_customers, sum(spend) ad_spend
FROM (
	SELECT customers.ip_country, count(email) customer_count
	FROM customers 
	WHERE customers.createdate > '2020-01-01'
	GROUP BY country) new_customers
JOIN facebook_ads ON facebook_ads.country = new_customers.ip_country
WHERE ad_spend.date > '2020-01-01'
GROUP BY country
ORDER BY ad_spend desc;

The example does the following:

  1. Aggregate a table of customers into a table of countries and customer counts who have become customers since January 1st, 2020.
  2. Joins that table with another table that contains Facebook Ads data by day
  3. Filters in only Facebook Ad spend data since January 1st, 2020
  4. Aggregates this all into a single table that has three columns: country, count of new customers from that country, and the ad spend for that country.

The good news is, this is about as complex as SQL gets. Pretty much everything else in SQL is just a variation of this.

Is SQL worth Learning?

In a word, yes. There is a practical reason and a conceptual one. The conceptual one is that learning SQL, like learning data structures or other programming languages, will expand how you think about data. It will help you organize data for analysis more efficiently and help you structure your thinking about how to answer questions with data. So even without a database, SQL can help you work with data.

The practical reason for learning SQL is that it allows you to gain insight faster, from more data, and come to better conclusions. That is true if you are analyzing keywords for PPC or SEO, analyzing how leads flow through your sales funnel, analyzing how to improve your email open rates, or analyzing traffic ROI.

 Here are just a few good reasons.

  1. You’ll spend less time trying to export and import data into spreadsheets
  2. You’ll be able to replicate your analysis easily from week to week or month to month
  3. You’ll be able to analyze more than 10k rows of data at once
  4. You can use BI tools to build dashboards with your data (and always keep them fresh)
  5. You’ll be able to merge bigger datasets together faster than VLOOKUPs
  6. You won’t have to ask for help from IT people, DBAs or engineers to get data out of a database or data warehouse for your analysis

How long does it take to learn SQL?

With dedication, you can develop a strong foundation in SQL in five weeks. I recommend Duke’s SQL class on Coursera to go from zero to usable SQL skills in less than two months. With that class and a couple of books about PostgreSQL, I was on par with most analysts at Postmates (except I had the context about the data!). A few months later  I learned enough to record this SQL demo with SEM data.

There are even good Android/iPhone apps that will help you learn the syntax through repetition. The class I recommend below (Data Manipulation at Scale: Systems and Algorithms) for Python also touches on SQL so it’s a double whammy and Python Anywhere also features hosted MySQL, so that’s a double-double whammy!

If you are looking for a short but substantive overview of SQL, this video from Free Code Camp is pretty good. I’m not suggesting you’re going to know how to write SQL in four hours, but at least you will get the gist.

All that being said, like many programming languages, learning SQL is a continual practice because, after learning the language, you can expand into managing a database rather than just analyzing data in a database. You can also pair your SQL skill with other programming skills to make all sorts of interesting applications! The good news, for the most part, it’s like riding a bike, once you learn it, you don’t really forget it—but it will take you a bit of time to re-learn a wheelie.

How to Scale Yourself in a Growth Role, Part 1: Your Time

A personal note: These are my learnings from consulting, being part of a growth team at Postmates, then leading growth at Panoply, so hopefully, it’s insightful. This post has been idling in a Google Doc for weeks because I couldn’t dedicate the time to complete it. So instead of never completing a single long post, I decided to break it into posts. This way its easier to publish and hopefully easier to read. If you want to catch the next two posts on “Your Work” and “Your Team” sign up here.


Growth is simple. It is just a measure size over time.

You can grow anything— visitors, users, subscribers, customers, revenue, profit—it doesn’t matter. If you can measure it, you can put it on a timeline, and you can grow it.

Accelerate Growth

I’m clearly not a gif expert.

 

The fun starts when you start trying to accelerate the rate of growth. Luckily, since growth is just the increase in size over time, you can simplify this problem down to reducing the amount of time it takes to get to a given size. And because the growth role in a startup is all about… growth, you can never move too fast. The question is, how?

This is where scaling your self comes in. By “scale,” I mean specifically increasing capacity (the ability to do more in the time you have) and competence (the ability to just do more).  This is how I would break down the ways you can scale yourself to accelerate growth. (part 1 of 3) 

Some of the later recommendations assume you have cash to spend or personal time that you are willing to devote to personal development. But to begin with, the list is prioritized in a way that starts with things that everyone can do and continues to things that are resource-dependent.

Do less.

If you’ve ever found yourself at 3 pm with 40 browser tabs open and not a single thing crossed off your to-do list, this one is for you. You know its much easier to join a watercooler brainstorming session or get lost in a Google search rabbit hole than to align stakeholders, reply to the mile-long partner email threads, and dig into the tedious details of a campaign launch. This one is all about self-discipline.

By “do less”, I mean “do less of the things that take time but don’t drive growth.” Since growth is a factor of time, it is your most precious commodity—especially if you’re running on a lean budget. Wasted time stunts growth.

This may not seem like a revelation but when it comes to self-discipline, it comes down to honesty. You have to be honest with yourself when you ask yourself, “Is this activity driving growth faster than all the other activities I could be doing?”

If you struggle with this type of introspection, ask yourself, what would the “Jerry Rice / Greta Thunberg / Steve Jobs of growth” do? The answer is probably something way more badass than, “continue scrolling through Instagram.”

But while self-discipline is great to keep you driving fast, prioritization is what keeps you driving toward success.

Prioritize. Don’t confuse activity for productivity.

In the previous section, I eluded to what I’ll refer to as the opportunity cost of time. The question, “Is this activity driving growth faster than all the other activities I could be doing?” considers that every activity takes some time but each activity accelerates growth at a different rate. 

For example, if you are scrolling through Instagram, you are increasing your rate of growth by roughly 0. But if you spend the same time launching a landing A/B test you might see a measurable increase in your rate of growth. If you spend your time onboarding an expert A/B testing freelancer, you’re likely to have a compounding effect!

How do you know what to prioritize?

First, you will need to be very familiar with you “one metric to rule them all.” As much as I’d like to dive into the discussion about metric selection and validation, instead I’m going to recommend reading “Lean Analytics.”

There is always “the one metric” at the bottom of the funnel and most often it relates to revenue. Mature growth programs will have a good sense of what the funnel looks like above this final metric. And growth programs that are moving at full speed are able to evaluate levers in terms of how they impact growth. 

If both of these ideas seem foreign to you, you better start building a funnel and quantifying the effect of various levers against your funnel. In other words, consider how things like keyword expansion, link building, content creation, landing page testing, etc. affect conversion rate through your funnel.

Once you know how powerful the different levers in your growth program are, you can start to determine how much time it will take to move a lever (and corresponding metric) a given amount. Then you can ask, “Is this activity driving growth faster than all the other activities I could be doing?”

The opportunity cost of time

I love dealing with the metaphysics of this discussion. As I laid out above, every period of time can be evaluated by its potential to accelerate growth. In addition, every activity requires an amount of time to accomplish. And finally, every activity effect can vary in the eventuality and duration of its effect. Time, this single dimension, has a three-dimensional impact on growth!

While it can be valuable to consider the dimensions of time, the last thing you want is analysis paralysis. That is why I recommend a universally helpful and satisfyingly simple tool: Dwight D. Eisenhower’s prioritization framework.

Dwight D. Eisenhower’s prioritization framework measures everything that could be done on two axes: urgency and importance. Let’s break it down in the context of growth.

Urgent unfortunately often means broken. This could be things like robots.txt files, shopping carts, links in email templates, or sign up forms. Every minute that these things stay broken, growth is probably decelerating. Urgency can also relate to time-sensitive campaigns that relate to external events or product/feature releases that must be prioritized.

Important things are generally measured against their impact on the “metric to rule them all.” You can generally gauge importance by asking “if I build/launch/do this thing, how will it affect growth in the next week/month/year?” (Consider the tradeoff in the week/month/year timeframes. Don’t always skip SEO in favor of quick response email or push campaigns.)

Next time you have to decide between updating a tired email template and writing a spec for a high-potential SEO feature, ask yourself, which will drive the most growth today and into the future. If your dev team has time to fill, perhaps pause the emails and give them a spec. If your “Welcome“ email is broken, fix it quickly and then deliver the spec.

A word of caution

As someone who clearly has a tendency to overanalyze things, I have one small word of caution: don’t let yourself fall into the trap of analysis paralysis. The 80% rule often applies to prioritization too. If you spend half your day trying to figure out what to do first, just call it a tie and get both things done.

It’s your time

I hope this discussion brought up some new ideas for you. If you disagree or notices something that is not included, please leave a comment and build on the discussion. 

If your company is in the early stages of growth and want to discuss how to sharpen your focus and accelerate growth, contact me on GrowthMentor. I love sharing what I’ve learned from my experience. As you’ll find in part 3, often mentorship is the biggest growth accelerant.

If you want to stay tuned for parts 2 and 3 sign up to receive the posts in your inbox. (no spam, just posts)



Drift Chat Event Tracking with Google Tag Manager – Less Code, More Analytics

The market for website chatbots is growing steadily and with it, there have been a lot of entrants since Intercom lead the way. From Hubspot to Whisbi, each has their own take on it. What I found especially impressive is how Drift has opened up the tech side of their platform. This makes it a really fun marketing technology. Building chatbots is a mix of analytical thinking, user experience design, and creativity. On top of that, it is easy to iterate and optimize quickly thanks to the amount of qualitative and quantitative feedback that the chats offer. While I’ve seen the Drift team steadily working on bot flow analytics, I’ll be honest I wouldn’t mind a lot more access to the data that is collected through the bot interactions.

Drift’s Javascript API provides a pretty comprehensive list of chat events to “listen” for (meaning you can trigger Javascript code when a chat event, like a “chat started” happens). With Google Tag Manager, you can send these events to an analytics tool like Google Analytics. You can even send along a bit of event metadata like the Drift inbox ID and conversation ID, so it’s easy to find the Drift conversation at the URL: https://app.drift.com/inboxes/{{inbox ID}}/conversations/{{conversation ID}}

Unfortunately, the API does not provide that actual text of the conversation (for a good reason) but using Google Analytics, you can still collect sidebar and message events and analyze them in aggregate. This can be really helpful in understanding on which URLs users are interacting with bots. From there you can spend your time building an optimizing bots with maximum effect.

Google Tag Manager Data Layer Events

The code at the end of the post registers a callback function for all the following events:

  • drift loaded
  • sidebar opened
  • sidebar closed
  • welcome message opened
  • welcome message closed
  • away message opened
  • away message closed
  • campaign message opened
  • campaign message closed
  • CTA clicked
  • campaign user started a chat or submitted an email
  • slider message closed
  • user started a new chat
  • user replied to a conversation
  • user submitted email address
  • schedule meeting card pushed to a conversation
  • user booked a meeting

Each event will push an object to the dataLayer that looks something like this:

{
  "driftEventType": "conversation",
  "driftEventName": "message:sent",
  "driftEventDescription": "user replied to a conversation",
  "event": "drift",
  "driftEventData": {
    "data": {
      "sidebarOpen": true,
      "widgetVisible": true,
      "isOnline": true
    },
    "conversationId": XXXXXXXXX,
    "inboxId": XXXXXX
  },
  "gtm.uniqueEventId": XX
}

That makes it easy to set up GTM Tags to fire on all Drift events based on a trigger with the firing rule: data layer event EQUALS drift You can be more specific about your firing rules by using the codedriftEventTypedriftEventName, or driftEventDescription.

The data layer event values can map directly to Google Analytics event values (driftEventName is good for Event Action and driftEventDescription is good for Event Description) Or, instead of using the driftEventDescription as the Google Analytics event description, you can use the inbox ID or conversation ID as well. You could even set event scoped custom-variable and capture all of it!

Enough talk! Get to the code.

To set up the Drift event listener, you will need to place the following Javascript code in GTM HTML tag between <script> tags. Make sure that the HTML tag is fired after the Drift is initialized so if you are using Google Tag Manager to instantiate Drift, setup your tags sequentially.

var driftEvents = [
  {driftEventType: 'sidebar', driftEventName: 'sidebarOpen', driftEventDescription: 'sidebar opened'},
  {driftEventType: 'sidebar', driftEventName: 'sidebarClose', driftEventDescription: 'sidebar closed'},
  {driftEventType: 'welcome message', driftEventName: 'welcomeMessage:open', driftEventDescription: 'welcome message opened'},
  {driftEventType: 'welcome message', driftEventName: 'welcomeMessage:close', driftEventDescription: 'welcome message closed'},
  {driftEventType: 'away message', driftEventName: 'awayMessage:open', driftEventDescription: 'away message opened'},
  {driftEventType: 'away message', driftEventName: 'awayMessage:close', driftEventDescription: 'away message closed'},
  {driftEventType: 'campaign', driftEventName: 'campaign:open', driftEventDescription: 'campaign message opened'},
  {driftEventType: 'campaign', driftEventName: 'campaign:dismiss', driftEventDescription: 'campaign message closed'},
  {driftEventType: 'campaign', driftEventName: 'campaign:click', driftEventDescription: 'CTA clicked'},
  {driftEventType: 'campaign', driftEventName: 'campaign:submit', driftEventDescription: 'campaign user started a chat or submitted an email'},
  {driftEventType: 'slider message', driftEventName: 'sliderMessage:close', driftEventDescription: 'slider message closed'},
  {driftEventType: 'conversation', driftEventName: 'startConversation', driftEventDescription: 'user started a new chat'},
  {driftEventType: 'conversation', driftEventName: 'message:sent', driftEventDescription: 'user replied to a conversation'},
  {driftEventType: 'conversation', driftEventName: 'message', description : 'user received a message from a team member'},
  {driftEventType: 'conversation', driftEventName: 'emailCapture', driftEventDescription: 'user submitted email address'},
  {driftEventType: 'conversation', driftEventName: 'scheduling:requestMeeting', driftEventDescription: 'schedule meeting card pushed to a conversation'},
  {driftEventType: 'conversation', driftEventName: 'scheduling:meetingBooked', driftEventDescription: 'user booked a meeting'}
]

drift.on('ready', function (api, payload) {
  dataLayer.push({event: 'drift', driftEventName: 'driftReady', driftEventDescription: 'drift loaded', driftEventData: payload});
  driftEvents.forEach(function (driftEvent) {
    drift.on(driftEvent.driftEventName, function (data) {
      driftEvent.event = 'drift';
      driftEvent.driftEventData = data;
      dataLayer.push(driftEvent);
    })
  });
});

This code might look weird at first glance but notice that instead of writing a code block to tell Drift to register a callback for each event, the code loops through all the events to register the callbacks. Paste it in your browser’s Javascript console (on a page that has Drift loaded) to see it work. When you interact with your Drift bot, you’ll see that events being sent to the data layer. You can also view the code on this Github Gist.

If this seems a bit intimidating today, I’d recommend checking out my open source guide: Digital Marketing Technical Fundamentals on GitHub for guidance on learning the tech side of digital marketing. Good luck!

Metrics and Levers of an SEO Measurement Framework

You are probably here because you have more SEO data than you know what to do with and precious little time to make sense of it. A framework is exactly what you need.

An SEO measurement framework draws the relationships between all the metrics that you collect and the levers that you can pull to improve them. Frameworks reduce complexity, provide insight, and best of all, frameworks enable focus.

SEO happens in a dynamic system that spans from technical infrastructure to human relationships, and user experience. But SEO outcomes are measured in acquisition, activation, and revenue. I believe (and you should too) in measuring SEO impact against outcomes in an effort to evaluate CAC.  We should also trust a system of metrics to avoid risks, surface optimizations, and identify opportunities. Let’s see what this system looks like.

SEO metrics and levers

Each SEO metric (on the left) can be influenced by one of several levers (across the top). Metrics can be viewed from the top down as a diagnostic tool or bottom-up as a progressive checklist – like a “hierarchy of needs” for SEO.

The outcome metrics at the top will matter to every business site and the lower, more technical metrics, will matter more to larger sites and more sophisticated SEO programs.

The purpose of this SEO framework is three-fold:

  • Tie together metrics and levers to measure the impact of each SEO lever
  • Provide a diagnostic for issues or optimization—each metrics is dependant upon the levers that influence that metric and the metrics below it
  • Identify the depth of metrics that is appropriate for tracking depending on the scope of an SEO program

Before we get too deep into detail about the usage of the framework let’s first take into consideration the ingredients for this framework, the data itself.

 

A Story: The Lifecycle of SEO Data

The intersection between man and machine; search creates a lot of data. To put it all in context, let’s think about it as a story of a lowly web page.

Creating a webpage generates data (it is data!). Publishing the page generates metadata (publish time, author, language, file size, and link relationships). And when you ask Google to fetch the page and submit it to the index, Google collects all this data and generates even more (a crawl date, response code, response latency, load time, and render time). If it decides to index the page… yep, more data (time of indexing).

Now that the page is in the index, it has the possibility to be returned in search results.  Wow does that generate a lot of data which Google, of course, collects. But for the intent of SEO, we care about the data we get back from Google Search Console (they keyword that retrieved the page, the device, geography, search type, the time of impression, where it ranked and if it got clicked). Over time, this data really adds up!

Humans are the last part of the story. Search results like lunch menus at a diner, give humans an opportunity to browse, select, and consume different options and transact with their owners. More data. (technical metrics like page speed, behavioral metrics like time on site, interactions, and transactions to name a few)

If we step back one step further, we see that these humans are referencing the page from other pages and social media posts. All those references create more traffic, not to mention more data for Google to factor into its algorithms. On and on this goes as Googlebot is hard at work traversing the internet to index a picture of this web so that it can deliver all this data to a ranking algorithm, that combined with humans’ personal, demographic and interaction data creates the phenomenon that is search.

Looking at one page, this all kind of makes sense. But when you multiply that by a few thousand, or a few million, things get complicated. Additionally, the data is collected in several different ways and stored in several different places. Let’s use this story from creation to consumption to transaction as the basis of an SEO measurement framework.

Story time is over so let’s get on to the metrics!

SEO metrics and levers

SEO Metrics

Each metric, from Crawl to Revenue, is dependant upon the metrics below it. As the story of the lonely webpage goes, a page must be created, crawled, indexed, returned in search results, and clicked for it to have any business impact.

SEO Outcomes

Unlike the universality of SEO performance metrics, outcomes are business-specific. At the outset of any strategic planning for SEO, you must define what success looks like. As soon as it is defined, track that conversion in your analytics tools so that you can measure that conversion completion against each performance metric. This will help you determine where to invest time and resources. Performance metrics will tell you how to invest time and resources.

Performance Metrics

Let’s start from the bottom and work our way up. In this way, each metric impacts the one that follows it.

Crawl: the taken for granted pre-requisite

The most taken-for-granted metric in search engine optimization is crawling. That is partly because the vast majority of the internet is either built on a CMS that optimize for crawling by default or the site is not big enough to have crawl issues. However, there is a class of home-baked websites that, due to developer oversight (aka “job security”), or otherwise, are not crawl-friendly. The bottom line is: if Google can’t find your pages, because you didn’t use React Router correctly or you forgot to put up a sitemap, SEO is not happening.

Index: this is an actual thing

The internet is really big- if Google wanted to index the whole thing it would probably double its electricity bill.

Search engines have to be picky about what they index. If a page is low quality or is likely to have a low search demand, Google can choose not to index the content of the page. Unless your site is millions upon millions of pages, isn’t internationalized correctly,  returns lot’s of low quality or duplicate pages, or violates Google’s TOS, it’s likely that all your pages if crawled, will be indexed.

It is important to consider the flip side of this problem though: indexing too much. Indexing a bunch of garbage pages can waste Google crawl budgets and make a site appear to be low quality. This lever can be pulled in both ways and depending on the site there is a right and wrong way to pull it.

Impressions: search impressions and searchers’ impressions

Yay, the site is indexed! Ok, now what? Impressions and the keywords that trigger them are a measure of your site’s effectiveness in SEO targeting. Not all keyword impressions are created equal.

Searcher context, like the device or geography, can have a huge impact on outcomes. Likewise, less traffic from better-targeted keywords can have a stronger effect on outcomes than lots of poor-quality traffic. Keyword impressions are a fundamental part of your customer understanding and a strategic north star.

Rankings: Oh sweet competition

Search engine result pages, with stacked ranked lists of URLs, are what created and sustain what we call SEO. Rankings are a moving target but we have reason to trust our theories about how they work. At the core, there is content and links but there are hundreds, if not thousands of other factors. Measuring keyword-URL rankings are important but seeking to understand why pages rank is almost as important. Measuring how process metrics, like the quantity and quality of inbound links, the quantity, and quality of content, and the components of speed are what differentiate a deliberate SEO program from a spray-and-pray approach.

Clicks: The first “win” (but not the last word)

Traffic is the easiest metric to report and the most misleading. To be useful, traffic metrics should be segmented by keyword groupings or page grouping to understand searcher intent and compared to outcome metrics to uncover if that intent was met appropriately. Traffic is a good indicator of performance but they are nothing without measures of quality like click-through rate and conversion rate.

SEO metrics and levers

SEO Levers

SEO levers are at the top of the matrix. These five buckets describe the five areas of optimization and growth at an SEO program’s disposal. The dots in the grid represent the relationship between a lever and the metric that it could directly influence.

SEO levers are ordered from left to right, from foundational technical factors to business-wise tactical factors. With few exceptions, you must get the foundation right before thinking about building upon it. Let’s take a look at each of these levers.

Links: The bowl and the cherry

Links are both basic and critical- like the bowl and the cherry on top of the ice cream sundae. Sites need sitemaps and internal links so that pages can be crawled and indexed and at the same time, external links are a major factor in Google’s ranking algorithm.

When to use this lever: Day one of publishing a site so that the pages have a chance to be crawled, indexed, and returned in search results. Or later on, when you have critical or high-conversion pages that have high impressions but don’t rank very well—ranking improvement will directly impact SEO outcomes.

Response/Render/Speed: AKA “technical SEO”

Search engine crawlers are not going to waste their time on pages that return an error or that take an infinity to load. Pages that hide a lot of their content behind javascript rendering or user interaction events rank poorly in search compared to their server-side rendered counterparts.

When to use this lever: Always monitor page responses for bots. 4XX’s and 5XX’s response codes will cause pages to fall out of the index. If you find that Google’s reported “Time spent downloading a page (in milliseconds)” is in the seconds or that pages feel slow when they are not cached, use this lever.

Meta/Schema/Title: Basic.

The page title, meta description, and schema markup can have the biggest impact on SEO with the least amount of work. Optimizing a page title for a target term can significantly influence ranking and Click Through Rate. From quick one-off metadata optimizations, site section-wide A/B tests, to sitewide implementation of schema.org markup, these optimizations can always yield benefits.

When to use this lever: Always unless you’re certain you have something better to do.

Content: Keep humans happy. Keep bots happy.

Page content is the only lever that affects literally every SEO metric. Page content determines if a page is worthy of indexing and of repeated bot crawls to check for fresh updates. Along with page title and schema, content determines what keywords will surface a page for in search. The content will also affect SEO outcomes as the topic of the page could target more or less search volume or conversion-oriented keywords. And of course, the content is ultimately there to influence a user.

This is the shallow end of SEO in some respects but when things get competitive, this is a critical lever. More content usually means more traffic. Better and fresher content usually mean more on top of that.

When to use this lever: Need to start getting traffic? Create content. Need better traffic? Create content that target’s the right keywords. Need to maximize conversions and capture every possible edge to compete with tough competitors? Test the crap out of your content.

Experience:  You grew the fruit, now harvest the juice

Experience, like crawl optimization, is often dismissed, taken for granted, or believed to be outside of the purvue of SEO programs. Those are dangerous opinions. As Google considers more inputs into its ranking algorithms that proxy user experience (for example, how frequently a searcher bounces back to search results), the more important user experience becomes.

Let’s not forget, you’ve done a lot of work to get searchers to your site! It is now your duty, not to mention your goal, to optimize the experience to the point that users are happy, ready, and able to convert. No part of SEO happens in a vacuum and just as content impacts experience, experience could impact ranking. Recognize this as part of the system.

When to use this lever: Always, but especially if organic traffic is hard to come by or you are heavy on traffic and light on conversions. And never let poor experience hurt your rankings.

 

SEO Reporting

A holistic picture of the SEO is great but it doesn’t do anything unless you do something with it. As the saying goes, “what get’s measured gets managed.”  Reporting on the right metrics sets the focus of an SEO program.

First, two underlying truth about reporting and one smart suggestion:

  • Metrics are only valuable to the degree that they can be impacted. Just because you collected a metric does not mean you need to report it. In many cases, alerts are just fine.
  • Reporting cadence should match the pace at which a metric can change—too fast is noise and too slow can be too late.
  • Investment in data collection should be proportionate to investment in SEO efforts

Reporting needs and focus will differ from business to business. The important thing is that they capture the goals and the actionable metrics and do not create noise. Collecting a lot of data is great, but most of it is only good when you need it to understand a cause or effect.

I think of reporting in three levels. Each level increases in granularity and narrows the scope of information. At the forefront are the business outcomes that SEO has yielded. Each level after is more for the operations of an SEO program rather and less for external stakeholders.

  • Outcomes“How is SEO doing?”

      • Cadence: Match Weekly/Monthly/Quarterly business reporting
      • Metrics: Organic Traffic, Signups, First Orders, App Downloads
      • Dimensions: Period over Period or Over Time
  • Performance / Outcomes in Context – “Why is SEO working?”

      • Cadence: Real-time / Daily Update Dashboards
      • Metrics: Impressions,  Rank, Clicks, Signups, First Orders, App Downloads
      • Dimensions: Period over Period or Over Time, Geography, Page Template or Keyword Grouping
  • Process /Monitoring – “What is the status or program X?”

    • Cadence: Dashboards / Ongoing monitoring and analysis of current programs or experiments
    • Metrics: Crawl rate, Speed, Impressions, Rank, Clicks, Page Interactions, Inbound Links
    • Dimensions: Response Codes (crawlers), Page Template

 

Measure the Metrics, Move the Levers

If there is one message to take from all this, it would be to understand that you may not need to track and manage every metric but you must know what metrics are important to achieve the intended outcomes of your SEO program. If you understand what metrics matter, it becomes easy to create focused strategies to achieve the right outcomes.

Pinging Search Engines with Your Sitemap Works – Here’s How

Freshness matters. It matters to searchers, therefore it matters to search engines, therefore it should matter to you. The best tool to ensure that search engines have your site’s freshest content is your sitemap. This post explains why pinging search engines with your sitemaps is important for SEO and how to optimize the process and stay fresh.

Why Sitemaps? Why Ping them?

You have probably already submitted your sitemap to Google, Bing and other search engines using their webmaster consoles, if so, you’re in a good spot— rest assured that search engines know that you have a sitemap. And if you have a site that is relatively static, meaning that pages are not being updated or created often, this is probably good enough. Depending on the perceived value of your site from a search engine’s perspective, search engines are probably requesting your sitemap(s) at a minimum, close to once a day, or potentially several times a day.

If you have a site that changes frequently then pinging search engines with your sitemaps is important. It is in your, and Google’s, best interest that they have the newest and most up-to-date content in their index, available to serve in search results. Examples of this would be marketplace sites where new products or services are being listed all the time, currency exchange listing sites where pages are continuously updating, or news publishers where freshness is critical.

To ensure that Google and other search engines know when your site has new content pinging them with your recently-updated sitemap is a must.

How pinging works

It is actually very simple. If you have ever used a web browser you will probably understand.

  1. You send an HTTP GET request to a specified “ping” URL on a search engine’s server with your sitemap’s URL appended to it.
  2. When the search engine receives the request, they, in turn, send an HTTP GET request to the sitemap URL that you submitted.
  3. If your sitemap returns a 200 “OK” response, with a valid sitemap, the search engine will scan the URL’s on the sitemap and reconcile them against the URLs that they have previously discovered.
  4. The search engine may decide to crawl some, all, or none of the of URL’s listed in the sitemap.

 

Sitemap Notification Received

 

It is important to note that pinging a sitemap does not guarantee that the sitemap URL’s will be crawled. Google says, “Please note that we do not add all submitted URLs to our index, and we cannot make any predictions or guarantees about when or if they will appear.” It is pretty safe to say that all the URLs will be recognized though. That being said, crawl rate has a tendency to increase sharply after you notify search engines that they are fresh.

Crawl Logs from Googlebot

 

How often to ping

Google used to recommend pinging no more than once per hour in their sitemaps documentation. As of 2015, they do not suggest any rate limit. Things have changed…

At Postmates, while testing Airflow data pipelines, we pinged our sitemaps 174 times in one day. Looking at our server logs, we recognized that, every single attempt to ping the sitemaps lead to a subsequent GET request from Googlebot. This was true for each of our sitemaps— every time.

The subtle nuance though, is that, since our sitemaps did not actually change during that period – and had been cached – the server returned a 304 (Not Modified) response and did not actually send a sitemap. This happens because Googlebot sends a request with a Last-Modified header as a comparison for the freshness of the sitemap file. The value of the Last-Modified header is the same as the last time that you pinged Google. This is Google saying essentially, “as long as the sitemap file has changed since I looked last time, I’m not interested.”

Stay Fresh

The takeaway is that if you really want to keep Google’s index of your site fresh, ping Google with your sitemaps. Do it as often as you have fresh content and make sure that the content listed in the sitemap file is discernibly different than last time you pinged Google. In addition, make sure that you set the appropriate headers with the updated sitemap to ensure that Googlebot will actually receive the new sitemap.

If you are ready to start pinging search engines, here are the URLs to do it:

https://www.google.com/webmasters/sitemaps/ping?sitemap=<sitemap_url>

https://www.bing.com/ping?sitemap=<sitemap_url>

If you want to tell Google about URLs manually, you can click these links and then append the full URL of your sitemap or sitemap index files.