There’s something to learn from black hat SEO tactics for PDF files

Go search Google for “free tiktok followers” and you will see some brilliant, although dirty #blackhat #SEO. Most of the top ten results belong to a spammer who has hacked into dozens of sites and created fake profiles and uploaded .pdf files to public directories with spun content.

That scam is kind of surprising but it’s why it works that is particularly interesting. It demonstrates that, in spaces with emerging volume of low competition, competitors can win by virtue of just having one or two solid ranking factors despite featuring absolute garbage content.

For example, the City of Neehaw, Wisconsin runs a WordPress site that should be trusted. It’s a government site with lots of links pointing to it. In this corner of the internet, Google doesn’t care if the content screams spam because if the site is trustworthy, the content ranks.

Can you apply this tactic?

If you’re like me, you’d at least play with the idea…

When you get past your moral objections, you can apply this “insight,” by hosting .pdf files in the static directories of public hosting sites. Let’s look at an example.

This might blow your mind. The subdomain that hosts the static assets that belong to the sites hosted on Squarespace has a Moz Domain Authority of 80! To put that in context, YouTube is 100, and 80 is about the score of itself!

No surprise, with that monstrous domain authority, and thousands upon thousand so hosted .pdf files, the site ranks for a ton of keywords.

And the traffic? About two-thirds of a billion visits per month.

Can a PDF outrank a page?

The short answer is yes. These PDF files rank #1 for dozens of searches and top 10 for millions. And that’s just from one site.

Granted, they are mostly very low competition keywords, so take that with a grain of salt. The highest competition search term that has PDF file ranking #1 has a keyword competition of 48. Not super high but volume doesn’t always equal value!

The thing to note here is how many of these searches are related to shady or illicit activities. Lots of them promise social media followers, different types of “hacks,” and copyright infringing activities.

And finally, I couldn’t find any instances where a Squarespace-hosted static PDF outranked a page on the Squarespace-hosted site. I can think of one big reason that Google would prefer a responsive page over a PDF: mobile devices don’t give a damn for PDFs.

Can a PDF file get a featured snippet?

Again, the short answer is yes. In the absence of competition, the quality and relevance of the content (and let’s not forget the domain authority) give the PDF file the edge.

The interesting thing here is that the snippets lack any structural formating. Because a pdf is much like a big .txt file in Google’s eyes, there is no option to create a table or bulleted lists. It’s just text or nowthing.

Should you apply this tactic?

Look, I’m not the SEO police. I’m just a guy who loves competition and is fascinated by search.

Should you hack a site just to upload some spam? No.

Is it worth trying to hosting a PDF on a public hosting domain if it’s your only chance at ranking for a target keyword? Maybe…

I love thinking about #minimumviableSEO and sometimes you have to be creative to rank for terms that your site isn’t ready to rank for.

But keep two things in mind. You won’t have tracking in Google Search Console or Google Analytics because. As you might have guessed, you can’t upload Google Search Console’s HTML verification file to the static hosting domains. (Of course, I tried it!) This means, the only tracking you’ll get is any tracked links that point to your site from a PDF.

So consider your costs and benefits. If you’re not just some spammer and you’re really thinking about SEO strategy, you’ve probably got better things to do with your time.

I hope this was interesting and it scratched your curiosity as it did mine!

Google Analytics 4 Pageview Custom Dimensions and Content Groupings

I decided it’s time I’d learn GA4 so I figured I’d implement it net new and figure it out along the way. A couple of things I rely on heavily on my sites are Content Groupings and Custom Dimensions. Luckily, GA4 handles these a lot better than UA—even if they are still a bit cumbersome.

I learned that there is only one Content Group parameter called content_group (instead of 5). It seems at first glance that event-scoped custom dimensions are capable of doing that which content groups used to handle (thanks to the new event-first rather than session-first paradigm).

To set customer dimensions, you can just set arbitrary parameters like canonical_name or page_author.

Shown below is an example of the query parameters that are sent with the page_view hit. Notice that the “ep.” (event parameter) prefix is added by the GA code. (I only passed canonical_name as a parameter) and GA adds the prefix when the hit is sent. This is also true for content_groups.

The query string parameters correspond to the Measurement Protocol

  • en: page_view
  • ep.criteria_id: 2554
  • ep.canonical_name: New%20Zealand
  • ep.countent_group4:
  • ep.countent_group3: NZ
  • ep.countent_group1: Country
  • ep.countent_group2: Active
  • ep.debug_mode: false

Notice that I set debug_mode to false so that I could take advantage of the debug view in the Google Analytics UI. That thing is pretty neat and helpful for spot-checking custom parameters.

One thing that seemed silly was that I couldn’t just arbitrarily name my content groups and assign them in the UI, but hey, it’s still a lot better than UA.

Defining and Naming Your Custom Dimensions

After you send the parameter with the hit, you have to set up the custom dimension in the GA4 UI. You can set them up by going into the left panel and selecting “Custom Definitions.”

The nice thing is that, if you’ve already set a parameter, you can just select the parameter name from the list and all you have to do is provide a Dimension Name and Description.

Pageview Custom Dimensions in Reports

To see the custom dimensions in a real report, you can go into the left panel and select Engagement > Pages and screens. Then you can find the dimension by clicking the big blue plus sign (+) and selecting whichever you like.

If you’re curious about how it works and you want to see it for yourself, you can go to the site and open up your Network console and filter for “?collect” to see the GA hits. The site is here: Please visit and send some data into that GA4 account!

Read This Before You Run Your First SEO A/B Test

This is a quick post to provide context for the SEO A/B test-curious people out there. I was prompted by a thread in Measure Slack and figured long-form would make more sense. I didn’t want to make this another hideously long SEO-ified post but rather get to the point quickly.  Here’s the post and then I’ll dive into my thoughts about SEO A/B testing.

After writing this, I realized I didn’t actually address statistical significance but as you’ll see, if you’re running experiments that are dependent on a fine margin of statistical significance your time is probably better spent elsewhere. Read on to see what I mean.

How is it different than a regular A/B test?

SEO A/B tests differ from normal A/B  tests (like Optimizely or Optimize) in two major ways: implementation and measurement

Unlike A/B testing tools like Google Optimize that apply test treatment to the page once it’s rendered in the browser (using Javascript), test treatments in an SEO test should take place on the server. Why? Because Google bot needs to recognize these changes in order to index the page appropriately. Yes, Googlebot does run some Javascript on certain pages in certain conditions, but there is no guarantee that Googlebot will execute the Javascript code required for it to be exposed to a test treatment. This means that a test page needs to be crawled in its final state and the only way to guarantee that is if the changes happen server-side. That’s where things get weird.

There are tools out there for running server-side A/B tests but none are remotely as simple as Google Optimize—they all require server-side changes. SEO A/B testing frameworks are not terribly complex to code. A typical testing framework would take the identifier of a page (for example, the product ID or the integrations slug in the case of a website like Zapier) and applies a variant-assignment algorithm to the page. This could as simple as checking if a product ID ends in an odd or even number and applying A and B variants that way or as complex a string hashing function and a modulo operator that returns a 0 or a 1 to apply A and B variants. In any case, this is, at the end of the day, a substantial product feature. See how Pinterest runs tests.

On the measurement side of things, either you’re using a proper server-side A/B testing tool with measurement capabilities or you have to go out of your way to track the results in your own tool. If you go the “roll your own” route, the same A/B assignment logic that determines the page treatment needs to be passed along to your web analytics tool. A simple way to do this is to assign the variable in the data Layer and use Google Tag Manager to assign a Content Grouping (A or B) to the page in Google Analytics. Content Groupings are a better choice than hit-level custom dimensions because Content Groupings apply to landing pages by design. 

Here’s an example of an A/B test that had no effect. Can you tell when it starts? When it ends? Who knows—that’s how you know the test was not effective! 

Interesting note on this experiment: The treatment was FAQ schema across ~8k pages. Google decided to only recognize the schema on 200 of them making it impossible to detect an effect…. and a waste of time to implement at scale if the change wouldn’t have a tangible effect.

Random assignment and Scale

If you’re thinking about running an SEO A/B test, random assignment and scale are two things you must consider from the outset. Just like browser-based SEO tests, you need to be able to trust that your groups are assigned randomly and guarantee that you will see enough traffic to produce valid results. I addressed random assignment a bit above and that’s not that hard to account for. It’s detecting a tests’ effect that creates a challenge.

The added layer of complexity in an SEO test is crawling and indexing. Because SEO tests are meant to effect a lift in ranking or CTR, you have to be positive that Google actually indexes your changes for the changes to take effect. Some pages don’t get crawled frequently and some, when they do, take forever to get re-indexed. This means there will be a lag to see results–the duration of that lag depends on your site’s crawl rate and Google’s opinion about how frequently it wants to reindex your pages. 

This means that scale matters a lot. If you know you will have to live with some degree of imperfection in your test, you have to overcome that with scale. By scale, I mean lots and lots of pages. The more pages you have and the more traffic they get, the more you will be able to detect your two time-series plots diverge as the pages are crawled, re-indexed, and changes take effect.

You’re probably asking, “how many pages do I need?” Well, I don’t have any science behind this but I would say 1,000+ for a bare minimum. And those 1k pages better have lots of traffic. If they don’t, it will be harder to attribute any changes to the test versus randomness. They also better have a lot of traffic because SEO A/B testing is a relatively high lift. A cost/benefit (potential) analysis is imperative before getting started. 

All that said, I’d be pretty confident in a 50/50 split across 10k pages. If you don’t have 10k pages, let alone 1k, you’re probably better off developing your programmatic SEO to reach this scale. VS pages, Integrations pages, Category list pages, and Locations pages are all good ways to get that page count up (and building them will have a bigger effect than all the optimizations after the fact).

Tracking and Goals

I mentioned some thoughts on tracking page variants in Google Analytics in the first section. That part is a technical problem. The goals part is a business problem.

Generally speaking, an SEO A/B test should focus on traffic. Why? Because in most cases, an SEO test will have the biggest impact on traffic and less of an impact on say, the persuasiveness or conversion rate of a set of pages. Sure, you could run title tag tests that might drastically change the keyword targeting or click-through intent but it’s usually safe to say that you will either start getting more or less of the same traffic and you can assume that traffic should convert to leads, revenue, etc at the same rate. 

Another argument for traffic is that changes in organic traffic volumes are going to be affected by fewer variables than revenue. The further the goal is from the lever your testing, the more data you have to collect to be sure that the test is actually what is causing the effect.

High impact tests

Finally, if you’ve made it this for you’re probably wondering about some test ideas that you have. Here are is how I think about prioritizing SEO tests. 

First, think about treatments that are high-SEO-impact. For me, title tags and meta descriptions are at the top of the list because, even if you aren’t able to affect rankings, this can have significant impacts on click-through rates. Another upside is that you will be able to see the effects of your test on the search pages. So a “intitle: <my title tag template string>” search in Google will give you a sense of how many of your pages have been indexed. This is something that you can check daily and see how Google is picking up your changes. 

Second, consider changes because those can also highly impact SERP CTR. The downside, I’ve learned is that FAQ schema changes have the potential to actually hurt CTR if the searcher can answer their question from the search page and not click through. Most other types of structured data that are reflected in the SERP will have a positive effect. For example, try omitting the star rating schema if the product is less than a 3/5.

H1 tags, copy blocks, and page organization are other options but be careful with these because these will affect the page’s UX. Copy blocks are likely to have the biggest effect in search because they can broaden a page’s keyword exposure.

At some point you really have to ask yourself, is this test actually better as a browser-based test? Is it obvious enough just to make the change than to test it? Is this whole thing worth it or are there better things I could be spending resources on? (That last one is a good one!)

Ok, I hope that helped you gain a little more of a grasp around SEO/AB testing. It was a little bit of a barf post but hey, I have an 8-mo old baby to watch after these days!

How to Choose a Domain Name for SEO

I’ve been thinking about names a lot lately. My wife and I are running through the process of generating and narrowing down a list of names for our oncoming baby. We didn’t do ourselves any favors by not discovering the gender prior to birth. I like to think this is a good way to remove any bias from the experiment. ;p

What’s in a (domain) name?

Picking a domain name is similar to naming a kid. We, humans, like to label things and we love to assign a lot of meaning to those labels. (whether that meaning is valid or not.) If you’ve ever read Freakanomics, you’ll remember their evidence of the correlation between names and things like career prospects. Without diving into the ethical discussion here, you can’t help but recognize the weighty and enduring effect of a name.

When it comes to SEO, naming affects two major levers: relevance and trust. From a ranking algorithm perspective, a domain name is certainly less important than things like content, geolocation, language, links, and domain age (though there are some benefits that I’ll discuss below!)  On the other hand, from a search engine results page (SERP) perspective, a domain name can have a lot of influence on click-through-rates. 

Consider relevance. When we lived in Hong Kong, we lived near a pet shop called “Bob’s Paradise.” And while we eventually realized that it was a pet store named after the lazy french bulldog that greeted you with a snort when you walked in, this name proved highly irrelevant in search results. Whenever anyone searches for “dog food,” they are likely to skip over that listing and select something that seems more relevant to their query—like, for example, “Pet Line.”

The local pack in the SERP, it looks like they finally got wise to this and updated their Google My Business listing…

Relevant domain names on the SERP

Now let’s consider trust. We’ll continue with the Hong Kong-related anecdotes. Many Hongkongers are given a traditionally Cantonese name at birth, then given a chance to choose their English name when they are a kid. In theory, I love this idea. I probably would have named myself Firetruck Fox. But this childhood choice might have a tendency to initiate an uphill battle as one tries to establish authority in a workplace. GI Joe and Pussy are two names that come to mind.

Extending this anecdote, you’re probably unlikely to choose “” over in a search for a new dentist. I’m guessing… 

Relevance Signals and Exact Match Domains

There’s a lot of debate over whether exact match domains affect how a domain ranks for the term that they are matching. I’m not convinced anybody has proven things one way or another but there are some tangible factors to consider that have less to do with the domain name itself: backlink anchor text, page title templates, topical relevance.

Branded anchor text = Keyword anchor text

Consider an experimental site that I created: Yeah, it’s an exact match domain for the main topic of the site: Google Apps Scripting

As you can see, I linked to it with its branding (which also happens to be the exact search terms it’s targeting.) By virtue of a cleverly selected domain name, it becomes much easier to get highly targeted anchor text links. 

Exact match domain keywords targeting

Keywords in Page Title Templates

Also, consider the home page title, Google Apps Script Tutorials and Examples • Making Google Apps Script Accessible to Everybody and post tag templates: {{post_title}} | Google Apps Script Tutorials and Examples. (Ok, I’ll admit they’re too long but loosen up!)

This way, every page on your site will have important keywords in the page title template. This is a commonly used tactic among affiliate sites. I know this because I recently had to research baby monitors. I found several sites like,, and All these sites will have important primary keywords and relevant modifiers as part of every page title.  Most of the articles on these sites are just narrowly keyword-targeted listicles about baby monitors, so their page titles end up looking something like this:

{{keyword targeted list}} | Baby Monitor List


10 Best Baby Monitors for Security | Baby Monitor List

Keywords Everywhere in title

It’s a cheap, but effective strategy

Exact Match Names and Topical Correlation is Not Ranking Causation

Finally, it bugs me how often plain old topical relevance is considered causal and not correlative when it comes to why exact match domains affect SEO. When you name your domain after the topic you’re going to cover, there is just that plain old topical relevance. Google is going to rank your site for that topic. There’s an obvious correlation between your target topic, your content, and relevant search queries! 

It’s just like slapping an “Eggs” label on an egg carton. It’s not the label that signals that it’s a carton full of eggs; it’s the fact that it’s an egg carton. Everybody would understand what it is whether it was labeled or not. You cannot suggest that it’s because you labeled the egg carton with “Eggs” that people understand it as such. It’s a correlation. No way to prove causality here. 

There probably was a time that you could trick Google into thinking that searchers were looking for your brand name (think here) rather than a generic search term (best baby monitors). I’d argue, at this point, Google is sophisticated enough, thanks to the Knowledge Graph, to be able to differentiate between common search terms and brand names that were created to match the search terms.

How to signal trust in a domain name

On the SERP, you only have a fraction of a second to convey that your site is trustworthy. First impressions are everything in this situation, so you don’t want to leave anything to chance!

Obviously brand is a HUGE factor here. Consider the domains,, or Both of these brands have established a reputation by developing content and products that are trustworthy. (But if we’re just starting out with an SEO project, we’re not that lucky.)

One way to build trust is to choose a domain name that conveys that your site specializes in, or covers a topic comprehensively.  That is the rationale behind two projects that I’m launching in tandem with this series: and These names convey what you’re going to get: definitions for technical terminology, and some insight about static (site generators). More on those to come. =] 

Given the choice between a and, I’d assert that most people are going to choose for the query “pelican vs jekyll” because this site seems to focus on this topic specifically, whereas, well‚ who knows about Jared!? (We’ll find out if this is true  soon!)

And how to lose it…

If you’re starting from scratch, there’s more to lose than to gain when picking a domain name. Here are a couple of things to avoid when choosing a domain name.

The first is probably obvious: crazy top-level-domains (TLDs). Don’t think too hard on this. Choose a .com domain name as much as you can. Other TLD’s like .org or .net might be appropriate but if you can’t find the right .com name, you might consider re-exploring the .com possibilities before choosing an unusual TLD. 

Some TLDs like .io and .co are gaining acceptance in certain spaces but it will be a long time before the majority of internet searchers trust .xyz and .guru domain names.

This is a little like the Freakanomics name example from the beginning of this post. Humans have become comfortable with .com, .org, and .net TLDs. There isn’t necessarily anything rational about people’s bias toward these TLD’s but you might as well lean into it and avoid the uphill battle. 

Another obliquely related concept is the TLS protocol, aka HTTPS. This isn’t part of the domain name per se but in the eyes of the savvy internet user, that ”s” on the end of https is a real signal, (in fact a sign!) of safety. Launch your site with SSL from the start. 

We’ll dive into HTTPS a bunch more in the next post in this series covering how to give your Cloudfront distribution a proper domain name with Route 53 (and how to make it secure with HTTPS).

For now, I hope this gave you some things to consider in choosing your domain name.  Take the time to do this right the first time. NOBODY likes domain migrations.

Does Cloudfront impact SEO? Let’s set it up for a S3 static site and test it!

This is the fifth post in my series about static site SEO with AWS. In the last post, we uploaded the static site to Amazon S3 so that it’s publicly accessible on the web. In this post, we’ll move beyond static site basics and start to discuss how Cloudfront CDN impacts load speeds and SEO. Yay! We’re finally going to start talking about SEO!

This post will focus on a HUGE factor for SEO: speed. We’ll first take apart a few acronyms and then we’ll talk about how to make your website fast with Cloudfront CDN. Luckily it’s all pretty straightforward but with all things SEO: the devil’s in the details. Let’s get started!

How does Cloudfront CDN work?

It’s all about speed. Cloudfront is a Content Distribution Network (CDN) service offered by AWS.  If you are unfamiliar with what a CDN does or how they work, the concept is pretty simple even if the technology is pretty advanced. 

Cloudfront Edge Locations
Image from:

Simply put: CDNs distribute your content from your web server (or in our case, an S3 bucket located in Ohio) to multiple other servers around the world called “edge locations.” These edge locations cache (store a copy) of your content so that it’s more readily available in different areas of the world.

This way when someone in Tokyo, Japan requests your website the requests don’t have to travel all the way to the S3 bucket in Ohio. Instead, the CDN intelligently routes their request to an edge location in Tokyo. This shortens the distance and reduces latency which means your website loads faster all over the world!

How does Cloudfront CDN improve SEO?

Speed matters, but CDN’s impact more than that when it comes to SEO. Search engines want to provide their users with relevant web content and an overall great experience. People hate slow websites so Google has to factor that into their ranking to ensure an overall good experience.

There is also another less obvious reason why search engines would favor faster websites: they have to crawl them. For Google, time is money when it comes to the energy cost of running the servers that crawl the web. If a website is slow, it actually costs them more to crawl that website than a faster website! That’s why CDNs and caching matter. (We’ll get to caching in the future.)

Search engine bots and crawling

There is also a third SEO benefit that comes from using a CDN. This is a bit of an advanced use case but if your site does a lot of Javascript client-side rendering, you can employ a CDN to deliver server-side rendered (SSR) pages to search engine bots.  This reduces the amount of time (and money) that search engines have to spend crawling your pages. 

Server-side rendering also means that a search engine doesn’t have to (or be able to) render Javascript just to parse your site’s content. That is a relatively expensive thing for a search engine to do. The benefit is that, since the search engine doesn’t have to spend so much effort to crawl and render your content, you will likely see a higher crawl rate which means you’ll have fresher content in search engine indexes. That’s great for SEO, especially for really large and dynamic sites. To do that, you’d have to use a CDN that offers edge functions like Cloudfront Lambda and Cloudflare Workers.

If you want to learn more about deploying Cloudfront for SEO check out this presentation.

But for our purposes, we are mostly concerned with flat out speed of content delivery. So let’s take a look at how a CDN improves speed.

Cloudfront CDN Example Speed Test 

In case you are like me and you don’t believe everything you read, here’s a side-by-side Pingdom website speed test to observe the effects of using the Cloudfront CDN. Both tests were run from Tokyo, Japan. The first test requests the new site’s homepage directly from the S3 bucket in Ohio and the second test requests the site after I’d deployed my site on Cloudfront. 

Test #1: From Japan to S3 bucket in Ohio

Pingdom test #1

Test #2: From Japan to nearest edge location when the Cloudfront Price Class was set to “Use Only U.S., Canada and Europe”

Pingdom test #2

Test #3: From Japan to nearest edge location when the Cloudfront Price Class was set to “Use All Edge Locations (Best Performance)” (so probably also Japan)

Pingdom test #3

I’m not sure why Pingdom didn’t render this last one…

In each of these tests, the most significant difference was in each request’s “Wait Time.”  Pingdom’s wait time is a measure of Time to First Bit (TTFB) which just means, how long does it take for the first bit of the requested resource to reach the browser. That’s a pretty critical metric though considering resources like javascript and CSS depend on the initial HTML document to load. 

Load time waterfall chart
Waterfall chart from Test #1

Here are the TTFB for the HTML document for each test:

  • Test #1 From Japan to S3 bucket in Ohio: 210 ms
  • Test #2: From Japan to the closest location in “U.S., Canada and Europe”: 114 ms
  • Test #3: From Japan to the closest global location (Japan): 6 ms!!
Cloudfront edge location speed test

As we can see, TTFB increases linearly with the distance the request has to travel. CDNs FTW!

Hopefully, this is enough to convince you that using a CDN is a great idea. Even if this test doesn’t directly prove an improvement in rankings, you can bet that your website’s audience will appreciate the decreased latency and improve load times.

Now let’s walk through setting up Cloudfront to deliver our static site hosted on S3.

Setting up Cloudfront CDN for a Static Site hosted on S3

Note: This configuration allows public read access on your website’s bucket. For more information, see Permissions Required for Website Access.

In the last post, we got a static site loaded to S3, so this post assumes you completed that. Otherwise, head back to and get your site loaded to S3.

S3 Static website hosting endpoint

NOTE #1:  It is really important that you use the static website hosting endpoint shown above for the next steps. That’s the one that looks like <your-bucket-name>.s3-website.<your-bucket’s-region> This will be really important in the future.

NOTE #2: You should have already set up public read access for the bucket in the last post.

  • Leave the Origin Path blank. Your website should be in the root directory of your bucket.

NOTE #3: Don’t worry about SSL (HTTPS) or Alternate Domain Names  for now. We’ll come back to that in the next post.

  • For the Viewer Protocol Policy, select “Redirect HTTP to HTTPS” because it’s cool. We’ll get back to that later too.
  • Choose a Price Class that fits your needs. For low traffic websites, each option will be pennies but you can choose the best option for your needs depending on the geographies for which you want to optimize load times.
  • Leave all the other settings with their default settings.
  • Choose Create Distribution.

Now just sit back and wait! Cloudfront will propagate your content out to the edge locations that you selected based on the Price Class. 

Your website will soon be available soon via a Cloudfront URL that looks something like 

Speed Testing your Cloudfront Distribution

Want to run the tests mentioned above? 

  1. Go over to 
  2. Enter the URL for your static site in your public S3 endpoint (<your-bucket-name>.s3-website.<your-bucket’s-region> 
  3. Then try it with your new Cloudfront URL ( https://<blah-blah-blah> ) . 
  4. Play around with Cloudfront Price Classes and Pingdom locations to see how the CDN’s edge locations impact TTFB and load times.

Moving forward

I hope you have the tools to understand why CDN’s impact SEO and how to set them up. If you have any questions, please leave them in the comment section below.

In the next post, we will finally more the website to its own custom domain name with HTTPS!

Hosting your Static Site with Amazon S3

If you followed the previous post about getting started with Pelican, you should have a Pelican website up and running on your local machine. This is where a lot of web development tutorials stop and that has always frustrated me. There are a million and one places to learn how to code online but just because you learned how to write a for-loop doesn’t mean you can actually make anything.  This post is meant to help bridge the gap between “starting a web project” and starting a web project on the actual web.

The goal of this post is to get your collection of HTML files that you’ve built using  pelican content uploaded to Amazon S3 so that everybody can access your site on the web!

This is the Amazon S3 Console. We’ll get to this soon…

Why host on Amazon S3?

In the last post, I discussed why I chose to build a static site with Pelican and there was a similar set of considerations why I chose to host this project on S3. I will address the main two:

Why not Github Pages?

Github Pages is, of course, awesome. There is no easier way to host a static site on the web and best of all it’s free. So why didn’t I choose the path of least resistance? The answer is configurability and server access logs. GitHub Pages definitely favor simplicity over configurability, and as a result, don’t give you many options for tweaking your site. That’s great for most but not for exploring the technical side of SEO where logs are really important.

Why not Netlify?

Netlify is also very cool. Like GitHub Pages, Netlify allows you to deploy sites from GitHub. It also strikes a good balance between simplicity and configurability—leaning more to the configurable side than GitHub Pages. It also has a host of very cool features, many of which are available for free. If I were just doing this as an ordinary web project, I probably would have chosen Netlify, but because this is meant to introduce more of the bare metal concepts, AWS wins out.

On top of those questions, there are really good reasons to choose AWS in its own right:

  1. It’s huge. So many companies host their sites and apps that it’s worth getting familiar with their concepts and terminology.
  2. It’s huge. AWS has so many services that you can take advantage of. We’re only taking advantage of S3, Route 53, and CloudFront but starting with AWS makes it easy to scale your projects if you want to do something crazy.
  3. It’s huge. The Cloudfront CDN is among the best out there and it allows us to mess around with HTTP headers and send the server access logs to an S3 bucket so they are as easy to access as our site.

On the flip side, AWS (and Amazon) is huge. So that may be a consideration for choosing other hosting solutions or CDNs. There are lots out there to choose from. I’d say just google them, but if you’re not into big corporations, I’d say, go try DuckDuckGo.

Prepare your static site for the cloud

Luckily, I don’t need to reinvent the wheel much here. @Kent put together an excellent technically-focused walkthrough. The only difference between this tutorial and his is that we are going to use Route 53 instead of a Cloudflare.

Up to this point, you should have a /output directory that has a handful of HTML files after running  pelican content. You could put an article in your /content directory to generate your first article. For instruction on adding an article,  refer to the Pelican articles docs.

That said, you don’t need any articles yet to keep pushing forward to get your site live.

Setting up relative URLs in your file

You may have already modified your file in order to apply your Pelican theme. If you haven’t it’s time to make your first modification (and a very important one).

What is a relative URL? you might ask. That’s a good question and an important one when it comes to hosting your site on the web. Relative URLs are URLs that start with the URL path rather than the protocol (https://) and the hostname ( In other words, they look like this: /my-folder/some-page/.

Why is that important? We are going to move all the site’s pages your computer, (http://localhost/) to an S3 bucket with a URL like, and in the future to your chosen domain name (something like If the site refers to internal pages by relative, instead of absolute URLs, you won’t have to worry about every internal link breaking every time you change your hostname. (This is also really important when it comes to domain migrations!)

 # Uncomment following line if you want document-relative URLs when developing

By default, Pelican will prefix all your URLs with http://localhost:8000 when you’re building your site locally. In order to change this to relative URLs, there is an easy switch in your file. All you have to do is find these lines and uncomment the line that says  RELATIVE_URLS = True.

Setting up your Amazon S3 Bucket

S3 stands for “Simple Storage Service.” The concept is quite simple, S3 provides you with “buckets” to host your content on their servers. It’s great: there’s no need to configure or manage a server.  Buckets can be public or private but for our purposes, we’ll have a public-facing bucket. 

Uploading a site to an S3 bucket is pretty simple but first, let’s set up an S3 bucket. You will, of course, need to set up an AWS account. If you don’t have an account yet, you’re in luck. You can start up your account for free for one year!

Creating your first S3 Bucket

Once you have an AWS account, follow these steps to create a bucket.

  • Go to your S3 console:
  • Click the “Create bucket” button
  • Name your bucket after the domain name you plan to use. My project is going to live at, so I named my bucket “”.  

If you haven’t decided on a domain name, now’s a good time to head over to Amazon Route 53 and find one that’s available.

  • Select a Region where you would like your content to be hosted… You could choose one near where your audience is likely to be, or you could choose based on price. I chose US East (Ohio) but to be honest, for this project, it doesn’t really matter.
  • Un-select “Block all public access”.This will allow your website content to be accessed publicly on the web.
  • Click “Create Bucket” at the bottom of your screen to finalize your bucket.

Configure your S3 Bucket to host your site

  • Open the bucket Properties pane
  • Choose “Static Website Hosting
  • Choose “Use this bucket to host a website”
  • Name of your index document in the Index Document box. For Pelican sites, the index (aka homepage) document is index.html.
  • Click Save to save the website configuration.
  • Copy your Endpoint URL and paste it somewhere for later use

Configure public access for your S3 Bucket

The last step will be to set the bucket’s security policy to allow public access (so everyone can view it on the web). Bucket policies determine who can access your buckets and what level of permissions they can have. For example, you might only want to grant view access to some people and write access to others. Read more about s3 bucket policies here.

For our purposes, we are going to allow “PublicReadForGetBucketObjects” for the objects (HTML files)  in the bucket that hosts the site. See step #3 below for more details.

Go to your new bucket and go to the Permissions tab

You should see “Block all public access“ is set to “Off”

  • Click Choose Bucket Policy
  • Paste in the following policy and replace with you actual site’s name
     "Principal": "*",
  • Click Save

Congrats, you have your first S3 bucket! Now let’s fill it with your website!

Upload your site to your S3 Bucket

There are two ways to do this. One is easy but manual, and the other takes a bit more time to set up but automates the process for the future. Since we have spent most of the time on this post “setting things up,” first we are going to do it the easy way first so we can see the site live on the web!

  • Go to the Overview tab
  • Click Upload
  • Open up a Finder window (assuming your using Mac) and navigate into your /output folder
  • Drag and drop all the files in /output into the Upload box (not the /output folder itself)
  • Click Next to proceed 
  • Set “Manage public permissions” to “Grant public read access to this object(s)” and click Next 
  • Leave the Storage Class  set to Standard and on the “Set Properties” step and Click Next
  •  Click Upload on the Review step
  • Your files will be uploaded to S3 shortly

If everything has gone well up to this point, your site is ready to view! 

Paste your Endpoint URL from step #6 of the “Configure your S3 Bucket to host your site” section above.

🎉🎉🎉 Your site is (hopefully) live! 🎉🎉🎉


Maybe it didn’t work perfectly… That makes a good opportunity to learn!

If your homepage looks like CSS rules weren’t applied

View the HTML source of your page and check if the CSS links look right. (They might be pointed to localhost). Go back to the “Setting up relative URLs in your file” step and check that everything looks good.

If your homepage displays an XML error saying that you don’t have permissions

This is probably because you missed a step setting up public permissions. Recheck the “Configure public access for your S3 Bucket” step and the “Upload your site to your S3 Bucket“ step to ensure that your site has public permissions.

Moving on

You might be satisfied here… I mean you have a website up and hosted. As you add articles, you can upload them to S3 as you’ve done before. But… you don’t have a cool URL and the deployment process is still pretty manual. And we haven’t even done any fun SEO stuff.

In the next posts, we’ll set up a custom domain with Route 53 and set up an automated s3 upload process. Then, we’ll do some interesting SEO stuff.

Starting an SEO Project with Python, Pelican, and AWS

I’ve been thinking about the best way to put a “course” together to demonstrate the overlap between web development and SEO. There are a lot of directions this could go in but I wanted to strike the right balance between technical depth and feasibility for someone who hasn’t done much in the way of web development. 

This is the beginning of this so-called “course,” though it’s more of a guided SEO project. Though this is just the beginning, I hope to teach something about technical SEO and SEO analytics by playing around with website code and hosting infrastructure and different measurement tools. Hopefully, you’re interested in Python too 🙂

Launching an web development and SEO project

To start the project in the right direction, I had to determine what technologies to use that balance of technical complexity and ease.

I had some considerations:

  1. Should I choose WordPress? Sure, it’s popular, but there are already tons of WordPress tutorials out there but the last thing I want to do is tell people they should go out and learn PHP and tear at the internals of a 15+ year-old web framework. 
  2. Python continues to grow in popularity. And that’s awesome, but I feared that, if this project were dependent on the audience’s ability to  deploy a Flask or Django site, it would steer the focus away from SEO toward web development,
  3. What about Jekyll? A static site generator seemed like a good balance between simplicity and technical depth.  (They are also really affordable to maintain!) Jekyll seemed like a good option but I opted against it because it’s built on Ruby. And Ruby, like PHP, just isn’t as hot as Python these days. 

This meant the focus would be a static site generator based on Python. This made Pelican an easy choice. Pelican has been around long enough and garnered enough support to have a decent ecosystem and plenty of well-written “Hello World”  tutorials. 

How Static Site Generators Work

Static site generators are a good balance between the power of a full-blown CMS and the simplicity of a pure HTML site. With a static site generator like Pelican, instead of worrying about hosting an application and a database, you only have to manage “flat files” and host the fully-rendered HTML pages on a server or file store like S3. 

Most static site generators, work like this:

  1. Choose or develop a theme that determines the style and layout of your pages
  2. Edit your site’s content in markdown files locally on your computer
  3. Run a command to build all the HTML files that make up your static site
  4. Transfer these pages from your computer to the hosting service of your choice
  5. Your website is up and running!

This means this project can be more about demonstrating SEO levers than web development. 

Introducing Pelican: A Static Site Generator, Powered by Python

Pelican is conceptually pretty simple. At a high level, you can think of it like this: 

  1. Your content: The “flat files,” commonly markdown or reStructuredText files, that you write and update to generate content on your site
  2. Pelican configurations: The settings in and that Pelican refers to when building your site
  3. Pelican itself: The processes that reads your content and configurations and generate the HTML files that make up the complete static site
  4. Pelican themes: Either pre-built or custom-build, themes make up the page templates (based on Jinja), CSS, and Javascript files that create the look and feel of your site
  5. Pelican plugins: Add-ons that allow you to change how Pelican reads your content, outputs your site, or does just about anything else during the build process

That means if you want to modify your site, you basically have one of two avenues: modify your themes or add/build plugins. 

That’s really nice compared to WordPress, where you would have to think about a MySQL database schema, WordPress’ architecture, and… writing PHP to generate pages. This isn’t the answer for the next Yelp, but it will let you do some interesting things with SEO for .0000001% of the complexity!

Getting Your Web Project Started

With any web project, there is some groundwork to be done before the real construction begins. If you’ve done any kind of development project before, you’ll find most of this looks pretty familiar. 

If you haven’t done any development project, I recognize that getting started can be the most challenging part. Hopefully, these resources should be sufficient for you to get started with enough coffee, grit, and free time.

Setup your Development Environment

If you’re unfamiliar with a development environment is more of a state of preparation than an actual “thing.” A development environment means having all your necessary software, packages, settings, and tools loaded, working correctly, and understood generally understood.

If you want some guidance with this step, I suggest Peter Kazarinoffs’ guide to setting up a development environment for Pelican. In his first post he covers:

  • Installing Python 3
  • Setting up a virtual environment (to keep this project’s packages separate from other Python packages on your computer)
  • Installing Pelican and other required Python packages
  • Creating a account
  • Making a directory for the site and linking it to GitHub

That’s a great guide to follow if this is your first Python development project ever (and even if it’s not).

Getting started with Github the easy way

This project assumes that you’ve never worked with git or GitHub before. For that reason, the demonstrations will use Github Desktop and the Atom code editor because they take a lot of the complexity out of git. So when it comes to the “Making a directory for the site and linking it to GitHub” step above, I’d suggest following this video about creating a repository with Github Desktop

Getting your site running locally

At this point, you can find numerous tutorials to get your site up and running on your “local machine” (aka your computer). I think Matthew Devaney’s tutorial series is pretty easy to follow. To get your site running locally follow his tutorial about installing Pelican, choosing a theme, and running it locally. You can also continue ‘s tutorial for guidance on using a virtual environment and build automation

You’ve completed this project step once you have done the following:

  • Installed Python and set up a virtual environment
  • Installed Pelican and the required Python packages
  • Created a GitHub account
  • Created a directory for your Pelican project
  • Linked your project site to Github
  • Setup Pelican via pelican-quickstart
  • Successfully ran pelican content to generate your pages locally
  • Opened your bare-bones site in your browser after running pelican --listen

If you’ve made it this far, congrats! You’ve made it past one of the hardest parts of this whole project.

Up next: Play around with Pelican and see if you can generate some posts. Our next step will be to host it live on Amazon S3!

Beginner’s Guide to Content Management Systems and Templating Engines

If you’re new to web development—especially if you are coming from adjacent territory like marketing or product management, you’ve probably begun to understand the basics of variables and for loops but there’s a big gap between where you are and how things get done in the wild. The goal of this post is to introduce you to the wild world of content management and tame it at the same time. I hope you enjoy!

Redefining the Content Management System Category

The term “Content Management System” describes an actively expanding category of software that can be used to manage the creation and modification of digital content. You may think that definition seems broad but that’s because the category is really broad! In fact, **Wikipedia doesn’t even include serving content as a requirement for this definition!

On top of the content creation-and-modification functionality, most CMS’s provide a much wider range of offerings. This starts, of course, with actually serving the content as HTML pages (rather than just text or image files) and includes common services like handling ecommerce transactions, managing user credentials and web analytics.

Here are a few members of the Venn diagram that is the CMS category:

  • WordPress: The most popular CMS started as a blog platform but now is capable of supporting any type of app from ecommerce shops to user-review websites.
  • Shopify: The popular ecommerce platform is essentially a CMS that focuses on managing product content and handling monetary transactions.
  • Dropbox: You might not consider the **digital asset management software **a CMS, the app allows you to upload images and serve them publicly on the web.
  • Hubspot: The customer relationship management (CRM) system, also offers a blog and landing page CMS with the personalization benefits that only a CRM could.
  • Ghost: One of the most popular in a category of “headless CMS,” Ghost serves content via an API which is perfect Javascript-based single-page apps (SPAs).
  • Webflow: A codeless CMS, it affords 80% of the content and design capabilities of a code-based CMS like WordPress, without needing to write custom code to use them.
  • Pelican: On the far corner of the CMS world is this static site generator written in Python. Pelican simply translates a collection of text files (Markdown is commonly used) into a website that can be hosted on services like.

From headless to full-stack, codeless to code-only, feature-rich to barely-a-CMS, you get the idea: the space is huge.

There is a CMS for almost any use case: If you don’t want to use a database, use a static site generator like Pelican or Jekyll. If you want to build a front end but don’t want to worry about the back end, use a headless CMS. If you want to use a CMS with a ton of community support and familiar developers, use WordPress. The list goes on.

No matter your use case, there are some general principles that apply to most CMS that are good to understand. That’s what we’ll get into next.

Beyond Content: Themes and Templates

If you are working on SEO but aren’t really familiar with web development, you might have heard some of these terms like “theme,” “templating,” and “rendering,” and wonder what they’re all about. Let’s fix that.

CMS Themes

Most CMS, including WordPress, Drupal, and Shopify employ a concept of themes (even if they call them something else). Themes are the CSS, Javascript, and HTML template files that package CMS content into a complete website experience.

The combination of these files in a browser creates the “look-and-feel” of a website: CSS files define visual characteristics like background colors, typography, and iconography while JS files create animations and interactivity. HTML templates determine the layout of the content on a page whether that’s in a phone, tablet, or computer screen.

**Themes offer one massive benefit: separation of concerns. **This means that the content is separated from the presentation of the content. And while you might take that for granted, this is what makes it so easy to modify the layout of all the pages on a site without having to change the HTML of each page. (This was a game-changer in the early Web 2.0 days!)

For example, if you want to change a WordPress theme or Pelican theme, all you have to do is add the theme files to your project and change some site settings and voilà, your site has a hot new look!

HTML Templating

At the core of every theme is the way that content is transformed and merged into HTML to create a webpage. This is called templating. Typically template files look like regular HTML files except, instead of containing the page’s content within the HTML tags, there are variable placeholders and other “templating logic” that will be replaced with the page’s content when the page is “rendered” by the CMS code.

How templating engines work

It really helps to see an example. Below is a very minimal HTML page with variables (denoted by the {{variable}} syntax) embedded in the HTML.

<!DOCTYPE html>
<html lang="en">
    <title>{{ article.title }} | {{}}</title>
    <meta name="description" content="{{ article.description }}">
    <h1>{{ article.title }}</h1>
    <div class="byline">by {{}} </div>
    <div class="content">{{ article.content }}</div>

View Gist on Github

After the variables are replaced with the page’s content, the output HTML file would look something like this:

<!DOCTYPE html>
<html lang="en">
    <title> My First Time Skydiving! | My Life Story</title>
    <meta name="description" content="I went skydiving for the first time in my life on my birthday. Check out the story and all the cool pictures.">
    <h1>My First Time Skydiving!</h1>
    <div class="byline">by Trevor Fox </div>
    <div class="content">On my 30th birthday I… </div>

View Gist on Github

Let’s step way back for a moment and discuss what this all means. Consider how much easier it is to scale up a website with a lot of content. Without templating, the best way to create new pages would be to copy a similar existing HTML file and replace the copied pages with new content. It sounds like a nightmare! And what if you wanted to change something about a page you copied from?? Yeah… you get the idea.

This generic approach to generating HTML files was a game-changer. Now nearly every single site on the web employs the concept of templating and it’s a foundational piece of SEO.

Let’s consider the popular marketplace website, The site has millions (billions?) of live pages at any moment but it likely has fewer than 100 different templates. There are pages for every geographic locale that list all the categories for that locale, there are several types of category pages that list all the event, service, and for-sale item listing in that locale, and there are pages with all the details for each event, service, or item in that category. The pages add up quickly, but thanks to templates, the site’s content scales beautifully.

This is a great example for SEO too. Consider for a moment that all of Craigslist’s category pages are just templates with content from other pages combined dynamically to generate content for these new pages. It’s like every time they have content for a few listing pages they get a category page for free—and all the organic traffic that comes with it.

Templating Engines

Most CMSs employ a templating engine (or you might hear them called “templating languages”) on top of whatever programming language the CMS was built on. templating engines make it easy to render HTML files from template files. Here are a few examples and the CMS’ and web frameworks that use them.

  • Liquid: Ruby’s most popular templating engine is used by Shopify, JekyllZendesk, and many others
  • Jinja2: Python’s most popular templating engine is used by Pelican and can be used with the web frameworks Flask and Django and others
  • HubL: Hubspot’s proprietary templating engine for web and email is an extension of on Jinja offering more Hubspot-specific functionality
  • Handlebars: The Javascript templating engine is used by Ghost, Enduro.js CMS, and web frameworks such as Ember.js and Metor.js.
  • PHP?: Ok, PHP is a scripting language but it’s still worth mentioning here because the language was built for constructing web pages. As part of the language’s design, you write logic into .php files that output content into the HTML.

This is only a small sample of templating engines. Most popular programming languages have several popular options but all of these templating engines have a few things in common: they assemble and render HTML files from templates that look like HTML but can contain variables, conditional logic, and other inserted or inherited templates.

In the future, I hope to break down how website templating works, especially within the context of SEO.

Defining Technical SEO for Non-Techincal SEO’s

Whether you’re just an SEO rookie or you’ve been playing along for years, you probably know that SEO is all about keywords, content, and links.

To “do SEO,” all you need is a website, some content, and you need to figure out how to get some links so your site can rank. The more you have of each, the more successful you’ll be (as long as you don’t do anything stupid.)

That’s it… Or is it?

While keyword sleuthing, golden content, and lots of trustworthy links will lead to some SEO success, that will only take you so far. There is another fundamental layer of SEO that is less well-known and too often misunderstood: technical SEO.

What is Technical SEO?

Technical SEO is the practice of optimizing a website’s code structure, navigational architecture, and ability to be fetched and rendered by browsers and bots. All this impacts how search engines crawl, index, rank, and display a site’s content. 

That was a mouthful! In other words, technical SEO is the process of designing, building, and maintaining websites so their content will attract as much search traffic as possible.

Unlike keywords, content, and links; great technical optimizations on their own won’t attract traffic. Instead, they act as a multiplier for a website’s SEO. And best of all: most types of technical optimizations have a site-wide effect. Unlike optimizing the content of a page, the results are multiplied.

Technical SEO is kind of like the bass line in a good song. It brings everything together and makes each part better at the same time.

The goal of this post is to introduce this concept for further posts that will cover the topic in more depth.

The next post will discuss content management systems because it requires relatively less technical knowledge than other aspects and because CMSs are the bridge between content and code—a perfect entryway into the technical side of SEO.

SEO with the Google Search Console API and Python

The thing I enjoy most about SEO is thinking at scale. Postmates is fun because sometimes its more appropriate to size opportunities on a logarithmic scale than a linear one.

But there is a challenge that comes along with that: opportunities scale logarithmically, but I don’t really scale… at all. That’s where scripting comes in.

SQL, Bash, Javascript, and Python regularly come in handy to identify opportunities and solve problems. This example demonstrates how scripting can be used in digital marketing to solve the challenges of having a lot of potentially useful data.

Visualize your Google Search Console data for free with Keyword Clarity. Import your keywords with one click and find patterns with interactive visualizations.

Scaling SEO with the Google Search Console API

Most, if not all, big ecommerce and marketplace sites are backed by databases. And the bigger these places are, the more likely they are to have multiple stakeholders managing and altering data in the database. From website users to customer support, to engineers, there several ways that database records can change. As a result, the site’s content grows, changes, and sometimes disappears.

It’s very important to know when these changes occur and what effect the changes will have on search engine crawling, indexing and results. Log files can come in handy but the Google Search Console is a pretty reliable source of truth for what Google sees and acknowledges on your site.

Getting Started

This guide will help you start working with the Google Search Console API, specifically with the Crawl Errors report but the script could easily be modified to query Google Search performance data or interact with sitemaps in GSC.

Want to learn about how APIs work? See: What is an API?

To get started, clone the Github Repository: and follow the “Getting Started” steps on the README page. If you are unfamiliar with Github, don’t worry. This is an easy project to get you started.

Make sure you have the following:

Now for the fun stuff!

Connecting to the API

This script uses a slightly different method to connect to the API. Instead of using the Client ID and Client Secret directly in the code. The Google API auth flow accesses these variables from the client_secret.json file. This way you don’t have to modify the file at all, as long as the client_secret.json file is in the /config folder.

    credentials = pickle.load(open("config/credentials.pickle", "rb"))
except (OSError, IOError) as e:
    flow = InstalledAppFlow.from_client_secrets_file('client_secret.json', scopes=OAUTH_SCOPE)
    credentials = flow.run_console()
    pickle.dump(credentials, open("config/credentials.pickle", "wb"))

webmasters_service = build('webmasters', 'v3', credentials=credentials)

For convenience, the script saves the credentials to the project folder as a pickle file. Storing the credentials this way means you only have to go through the Web authorization flow the first time you run the script. After that, the script will use the stored and “pickled” credentials.

Querying Google Search Console with Python

The auth flow builds the “webmasters_service” object which allows you to make authenticated API calls to the Google Search Console API. This is where Google documentation kinda sucks… I’m glad you came here.

The script’s webmasters_service object has several methods. Each one relates to one of the five ways you can query the API. The methods all correspond to verb methods (italicized below) that indicate how you would like to interact with or query the API.

The script currently uses the “webmaster_service.urlcrawlerrorssamples().list()” method to find how many crawled URLs had given type of error.

gsc_data = webmasters_service.urlcrawlerrorssamples().list(siteUrl=SITE_URL, category=ERROR_CATEGORY, platform='web').execute()

It can then optionally call “webmaster_service.urlcrawlerrorssamples().markAsFixed(…)” to note that the URL error has been acknowledged- removing it from the webmaster reports.

Google Search Console API Methods

There are five ways to interact with the Google Search Console API. Each is listed below as “webmaster_service” because that is the variable name of the object in the script.


This allows you to get details for a single URL and list details for several URLs. You can also programmatically mark URL’s as Fixed with the markAsFixed method. *Note that marking something as fixed only changes the data in Google Search Console. It does not tell Googlebot anything or change crawl behavior.

The resources are represented as follows. As you might imagine, this will help you find the source of broken links and get an understanding of how frequently your site is crawled.

 "pageUrl": "some/page-path",
 "urlDetails": {
 "linkedFromUrls": [""],
 "containingSitemaps": [""]
 "last_crawled": "2018-03-13T02:19:02.000Z",
 "first_detected": "2018-03-09T11:15:15.000Z",
 "responseCode": 404


If you get this data, you will get back the day-by-day data to recreate the chart in the URL Errors report.

Crawl Errors





This is probably what you are most excited about. This allows you to query your search console data with several filters and page through the response data to get way more data than you can get with a CSV export from Google Search Console. Come to think of it, I should have used this for the demo…

The response looks like this with a “row” object for every record depending on you queried your data. In this case, only “device” was used to query the data so there would be three “rows,” each corresponding to one device.

 "rows": [
 "keys": ["device"],
 "clicks": double,
 "impressions": double,
 "ctr": double,
 "position": double
 "responseAggregationType": "auto"


Get, list, add and delete sites from your Google Search Console account. This is perhaps really useful if you are a spammer creating hundreds or thousands of sites that you want to be able to monitor in Google Search Console.


Get, list, submit and delete sitemaps to Google Search Console. If you want to get into fine-grain detail into understanding indexing with your sitemaps, this is the way to add all of your segmented sitemaps. The response will look like this:

   "path": "",
   "lastSubmitted": "2018-03-04T12:51:01.049Z",
   "isPending": false,
   "isSitemapsIndex": true,
   "lastDownloaded": "2018-03-20T13:17:28.643Z",
   "warnings": "1",
   "errors": "0",
  "contents": [
    "type": "web",
    "submitted": "62"    "indexed": "59"

Modifying the Python Script

You might want to change the Search Console Query or do something with response data. The query is in and you can change the code to iterate through any query. The check method is used to “operate” on every response resource. It can do things that are a lot more interesting than printing response codes.

Query all the Things!

I hope this helps you move forward with your API usage, python scripting, and Search Engine Optimization… optimization. Any question? Leave a comment. And don’t forget to tell your friends!