Intro to SQL User-Defined Functions: A Redshift UDF Tutorial

As a data analyst, your credibility is as valuable as your analytical skills. And to maintain your credibility, it’s important to be able to answer questions correctly and consistently. That’s why you must be careful to integrate reproducibility into your SQL analyses. This tutorial is going to show you how you can use Redshift User Defined Functions (UDFs) to do just that.

Reproducibility in SQL Analysis

I’ve learned that there are two broad factors to reproducibility. The first is the data—different data for the same analysis is going to produce different results. A good example would be a court case: if you ask two witnesses the same question, each one will probably tell you something similar but likely slightly different. 

The second factor is the analytical methods. If we use the court case example again, this would be like the prosecution and the defense asking a witness the same question in two different ways. The lawyers would do this with the intent to get two different answers.

This post is more concerned with the second factor of reproducibility, the analytical method. Whenever you have to write complex SQL queries to get an answer, your analytical method (the SQL query) becomes a big variable. SQL is iterative by nature! Think about it, just be adding and removing “WHEN” conditions, you’re liable to drastically change your results. 

As you iterate on a numerical calculation or classification in a CASE expression you are likely to change your query results. And what happens when you have to perform the same analysis weeks later? You better hope you use the same iteration of your SQL query the second time as the first! 

And that is exactly where User-Defined Functions become so valuable! 

User-Defined Functions (UDFs) are simply a way of saving one or more calculations or expressions with a name so that you can refer to it as a SQL function for further use.

What are User Defined Functions?

User-Defined Functions can be used just like any other function in SQL like SUBSTRING or ROUND except you get to define what the output of the function is, given the input.

User-Defined Functions (UDFs) are simply a way of saving one or more calculations or expressions with a name so that you can refer to it as a SQL function for further use.

They are a great way to simplify your SQL queries and make them more reproducible at the same time. You can basically take several lines of code that produce one value from your SELECT statement, give it a name, and keep it for future use. Using UDFs, you can ensure that, given the same data, your calculations will always produce the same result.

UDF Functions are Scalar Functions. What does scalar mean?

As you learn about UDFs, you’ll see references to the word “scalar.” Scalar just means that the function is defined with one or more parameters and returns a single result. Just like the ROUND function has one parameter (the number) and an optional second parameter (the number of decimal places for rounding) and returns the rounded number. The function is applied to every value in a column, but it only returns one value for each row in that column.

A Hello World! SQL UDF Example

If you are familiar with any kind of programming language, this should be pretty simple. The CREATE FUNCTION syntax only requires a function name and a return data type. That’s it. 

A function called hello_world that returns ‘HELLO WORLD!’ every time would look like this:

create function hello_world ( )
  returns varchar
stable
as $$
  select 'HELLO WORLD!'
$$ language sql; 

In that case, the input data type and the output data type are both varchar because “HELLO WORLD!” is a text output. You could use your function like this:

select hello_world() as my_first_function;

And you’d get an output that looks like this:

my_first_function
HELLO WORLD!

But that wouldn’t be very interesting. You’ll generally want to modify the input(s) of your functions. Let’s take apart a more interesting UDF example.

How to Write SQL UDF Functions

This example function, called url_category takes a varchar as an input (a URL) and returns a varchar output (the category of the URL). To do this, the function compares the input (shown as $1 because it is the first parameter) to the conditions of a case expression.

You could also write this function with two parameters. Here’s an example if you were using Google Analytics data. You could take in the parameters, hostname and a page_path to get more granular with your URL categorization.

SQL UDF Functions with Multiple Arguments

This is Redshift’s example from their docs. It takes two parameters (both specified as float) and returns the value that is greater of the two.

create function f_sql_greater (float, float)
  returns float
stable
as $$
  select case when $1 > $2 then $1
    else $2
  end
$$ language sql;  

To refer to the different parameters in the function, you just use the dollar sign ($) and the order of the parameter in the function definition. As long as you follow that convention, you could go wild with your input parameters!

Redshift UDF Limitations

UDFs are basically restricted to anything that you can normally do inside a SELECT clause. The only exception would be subqueries—you cannot use subqueries in a UDF. This means you’re limited to constant or literal values, compound expressions, comparison conditions, CASE expressions, and any other scalar function. But that’s quite a lot! 

Common UDF Errors and their Causes

Once you start writing UDFs, you’ll find that it’s pretty easy going but there are two especially common “gotchas” 

ERROR:  return type mismatch in function declared to return {data type}

DETAIL:  Actual return type is {data type}.

This just means that you’ve created a function where the output value has a different data type than you said it would. Check that the return data type that you specified is the same as the function is actually returning. This can be tricky if your function is using a CASE expression because a CASE could accidentally return two different data types.

ERROR:  The select expression can not have subqueries.

CONTEXT:  Create SQL function “try_this” body

This means you tried to write a SELECT statement in your function that includes a subquery. You can’t do that.

ERROR:  function function_name({data type}) does not exist

HINT:  No function matches the given name and argument types. You may need to add explicit type casts.

There is one especially odd thing about Redshift UDFs. You can have several functions with the same name as long as they take different arguments or argument types. This can get confusing. The error here means that you’ve called a function with the wrong type of argument. Check the input data type of your function and make sure it’s the same as you input data.

Scaling your SQL Analysis with Confidence!

User-Defined Functions make it really easy to repeat your analytical method across team members and across time. All you have to do is define a function once and let everyone know that they can use it. On top of that, if you want to change the logic of your function you only have to do it in one place and then that logic will be changed for each user in every workbench, notebook, or dashboard!

Take advantage of this clever tool. Your team will thank you, and you will thank you later!

Getting Started with SQL for Marketing (with Facebook Ads Example)

As a digital marketer, I use SQL every single day. And looking back on my career so far, it would be fair (though a bit reductive) to say that I could define my career by two distinct periods: before I learned SQL and after I learned SQL. The two periods are distinct for three main reasons: 

  1. After learnings SQL I am faster at gaining insight from data 
  2. After learnings SQL I am able to make decisions based on more data
  3. As a result, I’ve been making better marketing decisions—and I have seen the traffic, conversion rates, and ROI to prove it. (Thanks to SQL)

If you’re at a crossroads in your career and you find yourself asking, “what coding language should I learn,” here is my case for SQL.

What is SQL (for Digital Marketing)

When you see SQL you might think it means “Sales Qualified Lead” but more commonly, SQL stands for “Structured Query Language.” It is a programming language that allows you to retrieve (or update, alter or delete) data from relational databases. (Relational is just a fancy word for a database that stores data in tables.) 

It’s kind of like ordering from McDonald’s. SQL is a language – a specific set of instructions – that you use to specify the results you want, the way you want them, in the quantity you want. Basically, SQL allows you to have your data your way.

How is SQL Used in Business

SQL has two main uses: applications and analysis. Applications (apps) from CandyCrush to Instagram store content and data about users in databases and then use it to create an experience (like keep track of how many comments you have on an Instagram post). On the other hand, you can use SQL for analysis in the same way you can sort, filter, and pivot data in Excel. (except with a lot more data)

SQL is different from most programming languages like Javascript, Python, and PHP because it only has one use: retrieving data from relational databases. So you can’t use SQL to build a website or a chatbot but you can use programming languages like Javascript, Python, and PHP to send SQL commands to databases and do something interesting with the results. WordPress is a good example of this. WordPress is written in PHP and the PHP code sends the SQL commands to a MySQL database and formats the data into blog articles and article lists.

What’s the difference between SQL and Excel?

Remember when you learned your first Excel formula? Pivot tables? VLOOKUP? You probably through you could take on the world! SQL is like that times 100. SQL and Excel are similar because they both allow you to analyze, manipulate, and make calculations, and join data in tables. 

The biggest difference between Excel and SQL is that you can analyze exponentially more data exponentially faster with SQL but you can’t update the data in SQL quite as easily. Also, SQL commands define how you want your data table to look when the data is retrieved so you are working with entire tables rather than individual cells. The benefit of this is that you don’t have to worry about making mistakes when copying formulas (and the analysis errors that come with that.) On the whole, I’d say SQL is much better than Excel, most of the time.

SQL Example in Marketing

This example shows an ROI analysis using SQL code that you would use to calculate how many customers you’ve acquired per country since the beginning of 2020, and the Facebook Ads spend that was spent in that country. 

SELECT country, sum(customer_count) total_customers, sum(spend) ad_spend
FROM (
	SELECT customers.ip_country, count(email) customer_count
	FROM customers 
	WHERE customers.createdate > '2020-01-01'
	GROUP BY country) new_customers
JOIN facebook_ads ON facebook_ads.country = new_customers.ip_country
WHERE ad_spend.date > '2020-01-01'
GROUP BY country
ORDER BY ad_spend desc;

The example does the following:

  1. Aggregate a table of customers into a table of countries and customer counts who have become customers since January 1st, 2020.
  2. Joins that table with another table that contains Facebook Ads data by day
  3. Filters in only Facebook Ad spend data since January 1st, 2020
  4. Aggregates this all into a single table that has three columns: country, count of new customers from that country, and the ad spend for that country.

The good news is, this is about as complex as SQL gets. Pretty much everything else in SQL is just a variation of this.

Is SQL worth Learning?

In a word, yes. There is a practical reason and a conceptual one. The conceptual one is that learning SQL, like learning data structures or other programming languages, will expand how you think about data. It will help you organize data for analysis more efficiently and help you structure your thinking about how to answer questions with data. So even without a database, SQL can help you work with data.

The practical reason for learning SQL is that it allows you to gain insight faster, from more data, and come to better conclusions. That is true if you are analyzing keywords for PPC or SEO, analyzing how leads flow through your sales funnel, analyzing how to improve your email open rates, or analyzing traffic ROI.

 Here are just a few good reasons.

  1. You’ll spend less time trying to export and import data into spreadsheets
  2. You’ll be able to replicate your analysis easily from week to week or month to month
  3. You’ll be able to analyze more than 10k rows of data at once
  4. You can use BI tools to build dashboards with your data (and always keep them fresh)
  5. You’ll be able to merge bigger datasets together faster than VLOOKUPs
  6. You won’t have to ask for help from IT people, DBAs or engineers to get data out of a database or data warehouse for your analysis

How long does it take to learn SQL?

With dedication, you can develop a strong foundation in SQL in five weeks. I recommend Duke’s SQL class on Coursera to go from zero to usable SQL skills in less than two months. With that class and a couple of books about PostgreSQL, I was on par with most analysts at Postmates (except I had the context about the data!). A few months later  I learned enough to record this SQL demo with SEM data.

There are even good Android/iPhone apps that will help you learn the syntax through repetition. The class I recommend below (Data Manipulation at Scale: Systems and Algorithms) for Python also touches on SQL so it’s a double whammy and Python Anywhere also features hosted MySQL, so that’s a double-double whammy!

If you are looking for a short but substantive overview of SQL, this video from Free Code Camp is pretty good. I’m not suggesting you’re going to know how to write SQL in four hours, but at least you will get the gist.

All that being said, like many programming languages, learning SQL is a continual practice because, after learning the language, you can expand into managing a database rather than just analyzing data in a database. You can also pair your SQL skill with other programming skills to make all sorts of interesting applications! The good news, for the most part, it’s like riding a bike, once you learn it, you don’t really forget it—but it will take you a bit of time to re-learn a wheelie.

Learn Programming and Databases for Digital Marketing | $10k Tech Skills 2/4

This is part t in the $10k Technical Skills for Digital Marketing Series. Part one introduced the importance of learning client-side technologies and offers a plan to learn Javascript, HTML and CSS for digital marketing. This post broadens the picture by introducing server-side programming and databases, which together compose web applications. Understanding how web applications work is a major benefit and should be essential knowledge for digital marketing. Enjoy!

Learning How Web Applications Work

From Google Bot to the Facebook Social Graph, to this WordPress blog; the web as we know it, is a massive system of interconnected applications. All these applications are simply programs and databases that run on servers. And while building these applications is a massive undertaking, learning the underlying processes and concepts is not. It takes nothing more than a bit of effort and time to learn enough about programming and databases to significantly set yourself and your resume apart from the average digital marketer.

While the benefits of learning how to write server-side code and interact with databases are not as immediately useful as many of the skills listed in Part 1, it is actually the process of learning this skill that presents the real value. The learning process will provide and intuition about how applications work and how processes can be scaled. This is key to digital marketing at scale.

If you can understand how search engine bots crawl websites, you can understand what makes a website crawl-friendly and you begin to understand the technical aspects of SEO. If you understand how algorithms work, you can understand Edge Rank and how Facebook decides to distribute content and broaden your reach. If you can understand how your CMS works you can map your analytics platform to it and gain better insight, which you can then use to, automate processes like email and offer personalized experiences. This new intuition about the web will continue to present opportunities.

You will also find many practical opportunities to employ your new programming and databases querying skills for digital marketing tasks and processes. While these skill starts to bleed into the realm of web development and data-science/business intelligence there are still many applications for server side scripting languages, from automation to optimization that can be very powerful for digital marketers.

Programing for the Web

When starting out on the road to learning server-side scripting, it is most realistic to start with PHP, Python or Ruby on Rails. All three are open-source, have strong communities and plenty of free learning resources. They all offer many similar advantages but each is powerful (and practical) in its own way.

programming languages for digital marketing

You see why I chose python…

PHP, for better or worse, has been the defacto server-side language of the Web for a long time. PHP is what powers WordPress, Magento, ModX and many other content management systems (CMS’s) and if you are in digital marketing for long you will likely run into at least one CMS powered by PHP. Learning PHP will come in handy when you find yourself wanting to add schematic markup for search engines or scripts for testing or analytics platforms like Optimizely or Google Tag Manager.

Depending on the site(s) and development resources (or lack thereof) that you are planning to work with, PHP may be good choice. It is the easiest code to deploy, as all popular web servers will support PHP.

Python is also used to build websites with frameworks like Django and Flask but more often, sites that are built with Python are apps built with a specific, custom purpose. Unlike, PHP and Ruby, which are designed for, web development; Python is a general-purpose language, which makes it go-to languages for data-science. (The resources featured here are most about how to learn python as that is the language I have focused learning the most. It has been great!)

For the technical marketer, Python is useful for scaling big(er) data science-y processes like web scraping, querying API’s, interactive analysis and reporting. Many processes that are carried out manually can be programmed using Python and run on a cron job or other triggers. One major benefit of Python is that it is so easy to learn thanks to the number of educational resources and friendly syntax. If you find yourself venturing into the world data science, you will be well prepared with Python as a large and active data science community supports it.

Ruby on Rails, well, I really haven’t played with it much but I have heard it’s very nice. The key, I hear is that it is good for rapid Web app development.

Node and JavaScript were much of the focus of Part 1: Learning Javascript.

Database Querying and Analysis

Digital marketing without data is not digital marketing and the digital marketer who is not data-literate is just a marketer. I am not arguing that all digital marketers should be become SQL ninjas but learning this skill, like programming, is as much about gaining an intuition about how systems and applications work as it is about developing a practical skill.

databases and analtyics

For a real-world use case that employs this skill as both intuition and a practical skill, look no further than Google Analytics. The Google Analytics web interface is ‘simply’ an elegant way to query, sort, filter and visualize site usage/performance data that is collected in a database. Having a general understanding of how Google Analytics stores data and how different data points/hit types interrelate allows you to be much more precise in your analysis and confident that the data that you pull from Google Analytics is accurate.

SQL knowledge can also help you in times that you need to pull raw data out of Google Analytics for further analysis or to avoid sampling. With Google Spreadsheets’ QUERY function, you can query spreadsheet data using SQL (Structured Query Language). For quick analysis and more complex inspection of data sets, writing SQL queries to explore and form data to your needs can be much quicker and easier to debug than writing a successive set of spreadsheet functions.

When dealing with large amounts of Google Analytics and sampling becomes a significant issue, Google’s BigQuery can be hooked up to Google Analytics to provide SQL-like query functionality with greater speed and scale. When you become comfortable with this GUI-less interface, the ability to query any database become much less daunting. You can then answer question by directly querying databases such as a website’s MySQL database using phpMyAdmin.

“Every question can be distilled into a database query,” Adam Ware of SwellPath told me when I first started learning about databases. The phrase seemed very exciting and has since proven accurate. I have come to realize that databases simply hold all the raw information in a defined structure. By asking the right question in the right way, your digital marketing insights are limited only by your data.

Once you start to understand how databases operate you will notice their appearance in apps across the web from ecommerce stores to analytics platforms to blogs. The understanding of how data is stored and how to extract the data that you want will also significantly improve your ability to use applications to their full potential, ideate optimization for existing apps and learn new applications. This intuition is skill that helps turn data into to knowledge and as you knowing is half the battle.

How to Learn Web Application Programming

Start Here: Codecademy.com

This is a great place to start with any web programming language. It is the quickest, easiest and most fun way to get up to speed with a programming language that I have found. Best of all it is free. It offers courses in PHP, Python and Ruby and hosts very helpful Q&A forums for coders who are just starting out.

Get up to Speed: Intro to Programming with Python (Udacity)

Once you have gotten a feel for programming (and a few bumps and bruises to go along with it) the next place to go is to start to understand the real power that programming offers. Udactity’s Inro to Programming in Python picks up where CodeAcademy.com leaves off and introduces capabilities rather than just syntax and style.

For the digital marketer, this course is especially useful because the course is taught through constructing a very rudimentary search engine crawler (or at least the general idea of one). This application opens a window of understanding how big applications work and will make you think differently about how search engines operate.

How the Web Works: Web Development (Udacity)

There is a lot more than just programming that differentiates marketers who can program from web developers. From hosting, to caching to cookies, this course does a good job introducing these concepts.

From my experience, it was a bit too difficult as a follow up from the Intro to Programming in Python course to actual create and deploy a web app, but it does give a substantially understand of technical web terminology to communicate effectively with web developers. (This is a very valuable skill if you ask me.) From this course you will have an understanding of what topics you need to take on in detail to accomplish what you need to do as a technical marketer.

How to Learn Data Analysis with Databases

Become Data-Driven: Intro to Data Science (U. Washington & Coursera)

In my opinion (and I am a bit of a biased data-geek), this is the best online course I have taken. Each lesson offered “aha!” moment after “aha!” moment while teaching really useful skills.

The course assumes only a bit of Python experience and offers a comprehensive introduction to everything from interacting with API’s with Python and to querying databases from the command line to how to think and communicate with data. Taking this course will make any digital marketer more data-driven and will back them up with the skills to take action.

Database Deep Dive: Introduction to Databases (Stanford & Coursera)

Slightly more academic than Intro to Data Science, this course provides a very strong foundation for understanding data and databases. If you are a “why does this work” type of person, this course will be very interesting.

From a practical standpoint, the course offers very good lessons on JSON and XML formats which are everywhere in digital marketing and their understanding is essential for working with API’s. The database portion of the course will take you at least as far as you will need to go for the digital marketing applications of databases.

Put it all Together: MongoDB University

If all these courses have been interesting to you and you have a good handle on programming, then this is the course for you! You will build a real webb app from the ground up while learning MongoDB hotness. Another digital marketing specific benefit to this course is that the app that you build is a blog. Understanding how blog content is retrieved and presented will help you understand a lot about semantic SEO.

I hope you have at least one direction that you are excited about. Leave a comment if you have any questions or follow the rest of  the $10k Technical Skills for Digital Marketing series by signing up for email notifications when new posts are up. API’s, web scraping and “how to learn” are still to come!