The Google Search Console BigQuery integration maintains three tables in your BigQuery instance. Two of them, searchdata_url_impression
and searchdata_site_impression
contain data about Google Search Console’s performance reports, and the third is a log table named ExportLog
that “contains information about each successful export to one of the previous data tables.” In other words, it tells you when new search data arrives in Google Search console. This post describes the data in each table and the differences among them.
You can jump to the table description and data dictionaries in the links below.
Comparing URL impressions and Site impressions
The difference between searchdata_url_impression
and searchdata_site_impression
is what we call “granularity.” Granularity refers to the “grain” or level of aggregation within a table or query.
Imagine a table with an “atomic” grain of Google search data, which would basically be a row for each search query and all the corresponding metadata. Since that’s completely infeasible due to the massive amount of data that the tables would store (and the obvious privacy issues), search performance tables are aggregated into a courser grain. Each table’s metrics are aggregated into one line for each combination of a different set of dimensions. We’ll get into each table’s corresponding dimensions below.
Another key difference is how the search metrics are aggregated. Sites may have multiple URLs listed on a single search. Each table provides a different view of the data depending on the analysis you need. We’ll get into this more in the next section.
searchdata_site_impression
Sometimes, a site might appear in multiple positions for a single search. This is common for e-commerce shops with tons of products or social media and forum sites with lots of user-generated content (aka Reddit). In these cases, it’s helpful to analyze the site’s overall performance rather than the performance of each URL. That’s what the searchdata_site_impression table is for.
This table provides data in the same way that Google Search Console UI does when you look at the Queries tab. (See below.)
It’s also the same as the GSC API when aggregated “by property.” Per the API docs, “aggregated by property means that all results for the same query that point to the same Search Console property are counted once in total. So if a query contains two URLs from the same site (example.com/petstore/giraffe and example.com/recipes/pumpkin_pie), this is counted as one impression when results are aggregated by property.”
The granularity of the searchdata_site_impression
table is determined by how impression data falls into the following dimensions:
- Date
- Property URL
- Query
- Anonymized query (True/False)
- Country
- Search Type
- Device
The searchdata_site_impression
table contains the ordinary GSC metrics, but there is some nuance. Since the data is aggregated by property, impressions, and clicks are counted once per site impression, meaning if the property has multiple listings for a single search, the site will only count for one impression, and if the user clicks, that only counts for one click.
sum_top_position definition
A site’s position is captured in the sum_top_position metric. Google describes the metric as “the sum of the topmost position of the site in the search results for each impression in that table row, where zero is the top position in the results. To calculate average position (which is 1-based), calculate SUM(sum_top_position)/SUM(impressions) + 1
.” It’s important to calculate this weighted average or you will end up with average positions that are abnormally high (and you might start to panic!)
For example, to calculate the Average Position shown in the GSC UI for July 21st, 2024, you’d use the following query.
SELECT SUM(sum_top_position)/SUM(impressions) + 1
FROM <project>.searchconsole.searchdata_site_impression
WHERE data_date = "2024-07-21"
In BigQuery, the searchdata_site_impression
table looks like this.
Here is the data dictionary for the searchdata_site_impression
table.
Dimension or Metric | Field name | Type | Description |
---|---|---|---|
DIMENSION | data_date | DATE | The day on which the data in this row was generated (Pacific Time). |
DIMENSION | site_url | STRING | URL of the property. For domain-level properties, this will be sc-domain:example.com . For URL-prefix properties, it will be the full URL of the property definition. Examples: sc-domain :trevorfox.com , or https://trevorfox.com/tools/ |
DIMENSION | query | STRING | The user query. When is_anonymized_query is true, this will null . |
DIMENSION | is_anonymized_query | BOOLEAN | Rare queries (called anonymized queries) are marked with this bool. The query field will be null when it’s true to protect the privacy of users making the query. |
DIMENSION | country | STRING | Country from where the query was made, in ISO-3166-1-Alpha-3 format. |
DIMENSION | search_type | STRING | One of the following string values: web , image , video , news , discover , googleNews .web: The default (“All”) tab in Google Search; image: The “Image” tab in Google Search; video: The “Video” tab in Google Search; news: The “News” tab in Google Search; discover: Discover results; googleNews: news.google.com and the Google News app on Android and iOS. |
DIMENSION | device | STRING | The device from which the query was made. One of the following values: • DESKTOP • MOBILE • TABLET |
METRIC | impressions | INTEGER | The number of impressions for this row. |
METRIC | clicks | INTEGER | The number of clicks for this row. |
METRIC | sum_top_position | INTEGER | The sum of the topmost position of the site in the search results for each impression in that table row, where zero is the top position in the results. To calculate the average position (which is 1-based), calculate SUM(sum_top_position)/SUM(impressions) + 1 . |
searchdata_url_impression
The searchdata_url_impression
table is significantly more granular than the site impressions table. In addition to the dimensions listed above, the data in searchdata_url_impression
is broken down further into 37 dimensions. These are all listed in the data dictionary below.
The image below shows how “wide” this table is because of all the boolean values. I’ve highlighted the “true” values to illustrate how infrequently the values will differ for most sites. These boolean values will be critical for some sites, depending on the content of the site and the SEO strategy.
One thing worth noting about the Google Search Console Export is that not all fields will be relevant to all Google Search Console properties. For example, cooking websites will have data related to recipe search features, while e-commerce websites will have data related to shopping features. For the most part, the column will be filled with mostly false values, so don’t worry if that is the case.
Here is the full list of fields in the searchdata_url_impression table.
Dimension or Metric | Field name | Type | Description |
---|---|---|---|
DIMENSION | data_date | DATE | The day on which the data in this row was generated (Pacific Time). |
DIMENSION | site_url | STRING | URL of the property. For domain-level properties, this will be sc-domain property-name . For URL-prefix properties, it will be the full URL of the property definition. |
DIMENSION | url | STRING | The fully qualified URL where the user eventually lands when they click the search result or Discover story. |
DIMENSION | query | STRING | The user query. When is_anonymized_query is true, this will be a zero-length string. |
DIMENSION | is_anonymized_query | BOOLEAN | Rare queries (called anonymized queries) are marked with this bool. The query field will be null when it’s true to protect the privacy of users making the query. |
DIMENSION | is_anonymized_discover | BOOLEAN | Whether the data row is under the Discover anonymization threshold. When under the threshold, some other fields (like URL and country) will be missing to protect user privacy. |
DIMENSION | country | STRING | Country from where the query was made, in ISO-3166-1-Alpha-3 format. |
DIMENSION | search_type | STRING | One of the following string values: web , image , video , news , discover , googleNews . |
DIMENSION | device | STRING | The device from which the query was made. |
DIMENSION | is_amp_top_stories | BOOLEAN | Appearance in the Top Stories carousel. |
DIMENSION | is_amp_blue_link | BOOLEAN | Appearance as an AMP page listed as a normal blue link. |
DIMENSION | is_job_listing | BOOLEAN | Appearance as a job posting result that shows a summarized view of a job. |
DIMENSION | is_job_details | BOOLEAN | Appearance as an AMP page listed as a normal blue link. |
DIMENSION | is_tpf_qa | BOOLEAN | Appearance as a Q&A page rich result. |
DIMENSION | is_tpf_faq | BOOLEAN | Appearance as an FAQ page. |
DIMENSION | is_tpf_howto | BOOLEAN | Appearance as a How-to rich result. |
DIMENSION | is_weblite | BOOLEAN | A deprecated search feature that allowed Google to serve faster, lighter pages to people searching on entry-level devices. |
DIMENSION | is_action | BOOLEAN | Appearance with an action that can be taken on the result. |
DIMENSION | is_events_listing | BOOLEAN | Appearance as a job posting result listed in a collection of job postings. |
DIMENSION | is_events_details | BOOLEAN | Appearance as a detailed description of an event. |
DIMENSION | is_search_appearance_android_app | BOOLEAN | Appearance as an event listed among other events. |
DIMENSION | is_amp_story | BOOLEAN | Appearance as an article featured in the Top Stories carousel. |
DIMENSION | is_amp_image_result | BOOLEAN | An Image Search result, where the image is hosted in an AMP page. |
DIMENSION | is_video | BOOLEAN | Appearance as a video feature that appears in either general search results (type Web) or Discover. |
DIMENSION | is_organic_shopping | BOOLEAN | Appearance as an organic shopping result. |
DIMENSION | is_review_snippet | BOOLEAN | Appearance as a video feature that appears in either general search results (type Web) or Discover. |
DIMENSION | is_special_announcement | BOOLEAN | Appearance with a special announcements structured data element with information. For example, information about COVID-19. |
DIMENSION | is_recipe_feature | BOOLEAN | Appearances as a recipe listing feature that lists information specific to that recipe. |
DIMENSION | is_recipe_rich_snippet | BOOLEAN | A Recipe rich result that appeared outside of the recipe list with more detail. |
DIMENSION | is_subscribed_content | BOOLEAN | Appearance as a Q&A feature for flashcard pages. |
DIMENSION | is_page_experience | BOOLEAN | Appearance while qualified for good page experience. |
DIMENSION | is_practice_problems | BOOLEAN | A practice problem search feature. |
DIMENSION | is_math_solvers | BOOLEAN | A math problem search feature. |
DIMENSION | is_translated_result | BOOLEAN | Appearance as a translated result. |
DIMENSION | is_edu_q_and_a | BOOLEAN | Appearance with a Q&A feature for flashcard pages. |
METRIC | impressions | INTEGER | The number of impressions for this row. |
METRIC | clicks | INTEGER | The number of clicks for this row. |
METRIC | sum_position | INTEGER | A zero-based number indicating the topmost position of this URL in the search results for the query. (Zero is the top position in the results.) To calculate the average |
Unlike the site impressions table, metrics are aggregated per URL. This means that if your site has multiple search listings on a single search, the metrics will be aggregated per URL. This means that if you aggregate the URL metrics, it’s possible that there will be overlap in the impressions (meaning inflated impressions), and calculating an average position may provide the results you expect.
This table is best used to understand a single URL’s performance or find overlapping queries between URLs. Beware that your results may get a little wonky when you start to aggregate data from multiple URLs.
Want to learn more?
If you are getting serious about SEO analytics in BigQuery, you should join my course on the topic! You can sign up here. Check out the full course syllabus to see what’s inside. I hope to see you there!