SEO with the Google Search Console API and Python

The thing I enjoy most about SEO is thinking at scale. Postmates is fun because sometimes its more appropriate to size opportunities on a logarithmic scale than a linear one.

But there is a challenge that comes along with that: opportunities scale logarithmically, but I don’t really scale… at all. That’s where scripting comes in.

SQL, Bash, Javascript, and Python regularly come in handy to identify opportunities and solve problems. This example demonstrates how scripting can be used in digital marketing to solve the challenges of having a lot of potentially useful data.

Visualize your Google Search Console data for free with Keyword Clarity. Import your keywords with one click and find patterns with interactive visualizations.

Scaling SEO with the Google Search Console API

Most, if not all, big ecommerce and marketplace sites are backed by databases. And the bigger these places are, the more likely they are to have multiple stakeholders managing and altering data in the database. From website users to customer support, to engineers, there several ways that database records can change. As a result, the site’s content grows, changes, and sometimes disappears.

It’s very important to know when these changes occur and what effect the changes will have on search engine crawling, indexing and results. Log files can come in handy but the Google Search Console is a pretty reliable source of truth for what Google sees and acknowledges on your site.

Getting Started

This guide will help you start working with the Google Search Console API, specifically with the Crawl Errors report but the script could easily be modified to query Google Search performance data or interact with sitemaps in GSC.

Want to learn about how APIs work? See: What is an API?

To get started, clone the Github Repository: https://github.com/trevorfox/google-search-console-api and follow the “Getting Started” steps on the README page. If you are unfamiliar with Github, don’t worry. This is an easy project to get you started.

Make sure you have the following:

Now for the fun stuff!

Connecting to the API

This script uses a slightly different method to connect to the API. Instead of using the Client ID and Client Secret directly in the code. The Google API auth flow accesses these variables from the client_secret.json file. This way you don’t have to modify the webmaster.py file at all, as long as the client_secret.json file is in the /config folder.

try:
    credentials = pickle.load(open("config/credentials.pickle", "rb"))
except (OSError, IOError) as e:
    flow = InstalledAppFlow.from_client_secrets_file('client_secret.json', scopes=OAUTH_SCOPE)
    credentials = flow.run_console()
    pickle.dump(credentials, open("config/credentials.pickle", "wb"))

webmasters_service = build('webmasters', 'v3', credentials=credentials)

For convenience, the script saves the credentials to the project folder as a pickle file. Storing the credentials this way means you only have to go through the Web authorization flow the first time you run the script. After that, the script will use the stored and “pickled” credentials.

Querying Google Search Console with Python

The auth flow builds the “webmasters_service” object which allows you to make authenticated API calls to the Google Search Console API. This is where Google documentation kinda sucks… I’m glad you came here.

The script’s webmasters_service object has several methods. Each one relates to one of the five ways you can query the API. The methods all correspond to verb methods (italicized below) that indicate how you would like to interact with or query the API.

The script currently uses the “webmaster_service.urlcrawlerrorssamples().list()” method to find how many crawled URLs had given type of error.

gsc_data = webmasters_service.urlcrawlerrorssamples().list(siteUrl=SITE_URL, category=ERROR_CATEGORY, platform='web').execute()

It can then optionally call “webmaster_service.urlcrawlerrorssamples().markAsFixed(…)” to note that the URL error has been acknowledged- removing it from the webmaster reports.

Google Search Console API Methods

There are five ways to interact with the Google Search Console API. Each is listed below as “webmaster_service” because that is the variable name of the object in the script.

webmasters_service.urlcrawlerrorssamples()

This allows you to get details for a single URL and list details for several URLs. You can also programmatically mark URL’s as Fixed with the markAsFixed method. *Note that marking something as fixed only changes the data in Google Search Console. It does not tell Googlebot anything or change crawl behavior.

The resources are represented as follows. As you might imagine, this will help you find the source of broken links and get an understanding of how frequently your site is crawled.

{
 "pageUrl": "some/page-path",
 "urlDetails": {
 "linkedFromUrls": ["https://example.com/some/other-page"],
 "containingSitemaps": ["https://example.com/sitemap.xml"]
 },
 "last_crawled": "2018-03-13T02:19:02.000Z",
 "first_detected": "2018-03-09T11:15:15.000Z",
 "responseCode": 404
}

webmasters_service.urlcrawlerrorscounts()

If you get this data, you will get back the day-by-day data to recreate the chart in the URL Errors report.

Crawl Errors

 

 

 

webmasters_service.searchanalytics()

This is probably what you are most excited about. This allows you to query your search console data with several filters and page through the response data to get way more data than you can get with a CSV export from Google Search Console. Come to think of it, I should have used this for the demo…

The response looks like this with a “row” object for every record depending on you queried your data. In this case, only “device” was used to query the data so there would be three “rows,” each corresponding to one device.

{
 "rows": [
 {
 "keys": ["device"],
 "clicks": double,
 "impressions": double,
 "ctr": double,
 "position": double
 },
 ...
 ],
 "responseAggregationType": "auto"
}

webmasters_service.sites()

Get, list, add and delete sites from your Google Search Console account. This is perhaps really useful if you are a spammer creating hundreds or thousands of sites that you want to be able to monitor in Google Search Console.

webmasters_service.sitemaps()

Get, list, submit and delete sitemaps to Google Search Console. If you want to get into fine-grain detail into understanding indexing with your sitemaps, this is the way to add all of your segmented sitemaps. The response will look like this:

{
   "path": "https://example.com/sitemap.xml",
   "lastSubmitted": "2018-03-04T12:51:01.049Z",
   "isPending": false,
   "isSitemapsIndex": true,
   "lastDownloaded": "2018-03-20T13:17:28.643Z",
   "warnings": "1",
   "errors": "0",
  "contents": [
    { 
    "type": "web",
    "submitted": "62"    "indexed": "59"
    }
  ]
}

Modifying the Python Script

You might want to change the Search Console Query or do something with response data. The query is in webmasters.py and you can change the code to iterate through any query. The check method checker.py is used to “operate” on every response resource. It can do things that are a lot more interesting than printing response codes.

Query all the Things!

I hope this helps you move forward with your API usage, python scripting, and Search Engine Optimization… optimization. Any question? Leave a comment. And don’t forget to tell your friends!

 

Is Slack Messenger Right for My Team? Analytics and Answers

Slack

From AOL Instant Messenger to WeChat stickers, digital communication has always fascinated me. From the beginning, there has always been so much we don’t understand about digital communication. It’s kind of like GMO; we just started using it without considering the implications.

We are continually learning how to use the the digital medium to achieve our communication goals. And meanwhile, our digital communication tools are ever evolving to better suit our needs. A prime example of this is the team messaging app, Slack.

Slack

Slack has adapted well and I would argue that it has dominated its ecosystem. There are a few reasons why I believe that it’s earned its position:

  1. It’s easy.
  2. It’s flexible.
  3. It’s not too flexible.

As a tool, Slack is malleable enough to form-fit your communication theories and practices and it does little to dictate them. This means that its utility and its effect are less a factor of the tool and more a factor of the our ability to shape its use.

So when the question was posed, “How well does Slack fit our needs as a team?” I have to admit I wasn’t sure. Days later, in my head, I answered the question with two more questions:

How well have we adapted the tool to us?

How well have we adapted to the tool?

The questions felt somewhat intangible but I had to start somewhere and me being me, I asked the data. I’ll admit I haven’t gotten to the heart of the questions… yet. But I did start to scratch the surface. So let’s step back from the philosophy for a minute, walk through the story, and start answering some questions.

So yeah, we tried Slack… Six months ago

A recently formed, fast moving and quickly growing team, we believed that we could determine own our ways of working. In the beginning, we set some ground rules about channel creation and, believe it or not, meme use (hence the #wtf channel). And that was about it. We promised ourselves that we would review the tool and its use. Then we went for it.

A while later, as I mentioned, a manager pointed out that we had never reviewed our team’s use of Slack. It seemed fine but the questions started to crop up in my head. Me being me, I had a to ask the data.

This all happened about the time that I started to play with Pandas. I didn’t answer the questions but I did get frustrated. Then I read Python for Data Analysis, pulled the data out of the Slack API (which only provides data about channels) and went a bit crazy with an iPython notebook.

To answer my theoretical questions, here are the first questions I had, a few that I didn’t and their answers.

How is Slack holding up over time?

Stacked Time Series

Don’t judge me. This was my first go with matplotlib.

This stacked time series shows the number of post per channel (shown in arbitrary and unfortunately non-unique colors) per week. The top outline of the figure shows the total number of messages for each week. The strata represent different channels and the height of each stratum represent the volume of messages during a given week.

It appears that there is a bit of a downward trend the overall number of messages per week. A linear regression supports that. The regression line indicates that there is a trend of about two fewer messages than the week before.

Linear Regression

If you ask why there appears to be a downward trend in total use over time, I think there a few ways to look at it. First, the stacked time series shows that high volume weeks are generally a result of one or two channels having big weeks rather than a slowing of use overall. This makes sense if you consider how we use channels.

We have channels for general topics and channels for projects. And projects being projects, they all have a given timeframe and endpoint. This would explain the “flare ups” in different channels from time to time. It would also explain why those same channels come to an end.

One way to capture the difference between short lived project channels and consistent topic channels is with a box plot. Box plots represent the distribution of total messages per week for each channel by showing the high and low week totals for a channel and describe the range (Interquartile Range) that weekly message totals commonly fall into.

Slack Analytics Channels Box Plot

Each box plot represents a Slack channel. The Y axis scales to the number of messages in that chanel

For a specific example, the channel on the far left (the first channel created, named #generalofficestuff) has had a relatively high maximum number of messages in a week, a minimum around 1 or 2 (maybe a vacation week) and 50% of all weeks in the last six months fall within about 7 and 28 messages with an average of 10 messages per week.

On the other hand, channels on the right side of the chart, more recently created and generally project-specific channels, describe the “flare ups” that can be seen in the stacked time series chart above. If you wanted to look deeper, you could make a histogram of the distribution of week totals per channel. But that is a different question and, for my purposes, well enough described with the box plot. 

So… how is Slack holding up over time?!

The simple answer is, use is declining. Simple linear regression shows this. The more detailed answer is, it depends. As the stacked time series and box plots suggest, in our case, use over time is better understood as a factor of the occurrence of projects that lend themselves especially well to Slack channels. I know what you’re saying, “I could have told you that without looking at any charts!” But at least this way nobody is arguing.

Projects… What about People?

Another way to look at this questions is not by the “what”, but by the “who.” Projects, and their project channels are basically composed of two components, a goal/topic and a group of people that are working toward that goal. So far we have only looked into the goal but this leaves the question, “are the people a bigger factor in the sustainability of a channel than the topic.

I looked at this question many ways but finally, I think I found one visual that explains as much as one can. This heat map shows the volume of messages in each channel per person. It offers a view into why some channels might see more action than others and it also suggests how project/channel members, and the synergy between them, might affect a channel’s use.

Slack Analyttics Hierarchical Clustering Heatmap

Volume of messages is represented by shade with Users (user_id) are on the Y axis and channels are on the X axis. Hierarchical clustering uses Euclidian distance to find similarities.

What I think is most interesting in this visualization is that is shows the associations between people based on the amount of involvement (posts) in a channel. The visual indicates that perhaps, use is as much a factor of people as the channel’s project or topic, or time.

There are, of course, other factors. We cannot factor out the possibility of communication moving into direct messages or private groups. But again, that is another question and beyond the bounds of this investigation.

So what?

So we got a glimpse at the big picture and gained a pretty good understanding of the root cause of what motivated the question. This is my favorite part. We get to sit back, relax, and generate a few new hypotheses until we run into a new question that we can’t avoid.

What I think is coolest about the findings is that it suggest a few more hypotheses about what communication media our team’s communication occasionally moves to and what media it competes with. Now these investigations start to broach the fundamental questions that we started with!

There are a few things at play here. And the following are just some guesses. It could be that email dominates some projects or project phases because we are interacting with outside partners (people) who, for whatever reason, cannot or will not use Slack. Sorry Slack. It could also be that, due to the real world that we live in, communication is either happening over chat apps like WeChat or WhatsApp.

In either case, we return to the idea of people adapting to tools that are adapting to people. The use of digital communication tools reflects the people who use them and each person’s use reflects the structure and offerings of the tool.

And what’s next?

Hopefully, if you read this you have more questions about this reality and I might (probably) go on to try to answer a few more. I think there are a few interesting ways to look at people are norming with Slack.

Maybe, you are interested in how all this Pandas/matplotlib stuff works because I am too. So I think it will be fun to post the iPython notebook and show how it all works.

Otherwise, it will be interesting to watch how this tool and this team continue to evolve.

Learn Programming and Databases for Digital Marketing | $10k Tech Skills 2/4

This is part t in the $10k Technical Skills for Digital Marketing Series. Part one introduced the importance of learning client-side technologies and offers a plan to learn Javascript, HTML and CSS for digital marketing. This post broadens the picture by introducing server-side programming and databases, which together compose web applications. Understanding how web applications work is a major benefit and should be essential knowledge for digital marketing. Enjoy!

Learning How Web Applications Work

From Google Bot to the Facebook Social Graph, to this WordPress blog; the web as we know it, is a massive system of interconnected applications. All these applications are simply programs and databases that run on servers. And while building these applications is a massive undertaking, learning the underlying processes and concepts is not. It takes nothing more than a bit of effort and time to learn enough about programming and databases to significantly set yourself and your resume apart from the average digital marketer.

While the benefits of learning how to write server-side code and interact with databases are not as immediately useful as many of the skills listed in Part 1, it is actually the process of learning this skill that presents the real value. The learning process will provide and intuition about how applications work and how processes can be scaled. This is key to digital marketing at scale.

If you can understand how search engine bots crawl websites, you can understand what makes a website crawl-friendly and you begin to understand the technical aspects of SEO. If you understand how algorithms work, you can understand Edge Rank and how Facebook decides to distribute content and broaden your reach. If you can understand how your CMS works you can map your analytics platform to it and gain better insight, which you can then use to, automate processes like email and offer personalized experiences. This new intuition about the web will continue to present opportunities.

You will also find many practical opportunities to employ your new programming and databases querying skills for digital marketing tasks and processes. While these skill starts to bleed into the realm of web development and data-science/business intelligence there are still many applications for server side scripting languages, from automation to optimization that can be very powerful for digital marketers.

Programing for the Web

When starting out on the road to learning server-side scripting, it is most realistic to start with PHP, Python or Ruby on Rails. All three are open-source, have strong communities and plenty of free learning resources. They all offer many similar advantages but each is powerful (and practical) in its own way.

programming languages for digital marketing

You see why I chose python…

PHP, for better or worse, has been the defacto server-side language of the Web for a long time. PHP is what powers WordPress, Magento, ModX and many other content management systems (CMS’s) and if you are in digital marketing for long you will likely run into at least one CMS powered by PHP. Learning PHP will come in handy when you find yourself wanting to add schematic markup for search engines or scripts for testing or analytics platforms like Optimizely or Google Tag Manager.

Depending on the site(s) and development resources (or lack thereof) that you are planning to work with, PHP may be good choice. It is the easiest code to deploy, as all popular web servers will support PHP.

Python is also used to build websites with frameworks like Django and Flask but more often, sites that are built with Python are apps built with a specific, custom purpose. Unlike, PHP and Ruby, which are designed for, web development; Python is a general-purpose language, which makes it go-to languages for data-science. (The resources featured here are most about how to learn python as that is the language I have focused learning the most. It has been great!)

For the technical marketer, Python is useful for scaling big(er) data science-y processes like web scraping, querying API’s, interactive analysis and reporting. Many processes that are carried out manually can be programmed using Python and run on a cron job or other triggers. One major benefit of Python is that it is so easy to learn thanks to the number of educational resources and friendly syntax. If you find yourself venturing into the world data science, you will be well prepared with Python as a large and active data science community supports it.

Ruby on Rails, well, I really haven’t played with it much but I have heard it’s very nice. The key, I hear is that it is good for rapid Web app development.

Node and JavaScript were much of the focus of Part 1: Learning Javascript.

Database Querying and Analysis

Digital marketing without data is not digital marketing and the digital marketer who is not data-literate is just a marketer. I am not arguing that all digital marketers should be become SQL ninjas but learning this skill, like programming, is as much about gaining an intuition about how systems and applications work as it is about developing a practical skill.

databases and analtyics

For a real-world use case that employs this skill as both intuition and a practical skill, look no further than Google Analytics. The Google Analytics web interface is ‘simply’ an elegant way to query, sort, filter and visualize site usage/performance data that is collected in a database. Having a general understanding of how Google Analytics stores data and how different data points/hit types interrelate allows you to be much more precise in your analysis and confident that the data that you pull from Google Analytics is accurate.

SQL knowledge can also help you in times that you need to pull raw data out of Google Analytics for further analysis or to avoid sampling. With Google Spreadsheets’ QUERY function, you can query spreadsheet data using SQL (Structured Query Language). For quick analysis and more complex inspection of data sets, writing SQL queries to explore and form data to your needs can be much quicker and easier to debug than writing a successive set of spreadsheet functions.

When dealing with large amounts of Google Analytics and sampling becomes a significant issue, Google’s BigQuery can be hooked up to Google Analytics to provide SQL-like query functionality with greater speed and scale. When you become comfortable with this GUI-less interface, the ability to query any database become much less daunting. You can then answer question by directly querying databases such as a website’s MySQL database using phpMyAdmin.

“Every question can be distilled into a database query,” Adam Ware of SwellPath told me when I first started learning about databases. The phrase seemed very exciting and has since proven accurate. I have come to realize that databases simply hold all the raw information in a defined structure. By asking the right question in the right way, your digital marketing insights are limited only by your data.

Once you start to understand how databases operate you will notice their appearance in apps across the web from ecommerce stores to analytics platforms to blogs. The understanding of how data is stored and how to extract the data that you want will also significantly improve your ability to use applications to their full potential, ideate optimization for existing apps and learn new applications. This intuition is skill that helps turn data into to knowledge and as you knowing is half the battle.

How to Learn Web Application Programming

Start Here: Codecademy.com

This is a great place to start with any web programming language. It is the quickest, easiest and most fun way to get up to speed with a programming language that I have found. Best of all it is free. It offers courses in PHP, Python and Ruby and hosts very helpful Q&A forums for coders who are just starting out.

Get up to Speed: Intro to Programming with Python (Udacity)

Once you have gotten a feel for programming (and a few bumps and bruises to go along with it) the next place to go is to start to understand the real power that programming offers. Udactity’s Inro to Programming in Python picks up where CodeAcademy.com leaves off and introduces capabilities rather than just syntax and style.

For the digital marketer, this course is especially useful because the course is taught through constructing a very rudimentary search engine crawler (or at least the general idea of one). This application opens a window of understanding how big applications work and will make you think differently about how search engines operate.

How the Web Works: Web Development (Udacity)

There is a lot more than just programming that differentiates marketers who can program from web developers. From hosting, to caching to cookies, this course does a good job introducing these concepts.

From my experience, it was a bit too difficult as a follow up from the Intro to Programming in Python course to actual create and deploy a web app, but it does give a substantially understand of technical web terminology to communicate effectively with web developers. (This is a very valuable skill if you ask me.) From this course you will have an understanding of what topics you need to take on in detail to accomplish what you need to do as a technical marketer.

How to Learn Data Analysis with Databases

Become Data-Driven: Intro to Data Science (U. Washington & Coursera)

In my opinion (and I am a bit of a biased data-geek), this is the best online course I have taken. Each lesson offered “aha!” moment after “aha!” moment while teaching really useful skills.

The course assumes only a bit of Python experience and offers a comprehensive introduction to everything from interacting with API’s with Python and to querying databases from the command line to how to think and communicate with data. Taking this course will make any digital marketer more data-driven and will back them up with the skills to take action.

Database Deep Dive: Introduction to Databases (Stanford & Coursera)

Slightly more academic than Intro to Data Science, this course provides a very strong foundation for understanding data and databases. If you are a “why does this work” type of person, this course will be very interesting.

From a practical standpoint, the course offers very good lessons on JSON and XML formats which are everywhere in digital marketing and their understanding is essential for working with API’s. The database portion of the course will take you at least as far as you will need to go for the digital marketing applications of databases.

Put it all Together: MongoDB University

If all these courses have been interesting to you and you have a good handle on programming, then this is the course for you! You will build a real webb app from the ground up while learning MongoDB hotness. Another digital marketing specific benefit to this course is that the app that you build is a blog. Understanding how blog content is retrieved and presented will help you understand a lot about semantic SEO.

I hope you have at least one direction that you are excited about. Leave a comment if you have any questions or follow the rest of  the $10k Technical Skills for Digital Marketing series by signing up for email notifications when new posts are up. API’s, web scraping and “how to learn” are still to come!