# How to easy understand Analytics Functions on Google BigQuery

## Get all the major concepts and practices in a simple and clear way.

In this post you’ll easily understand analytics functions on BigQuery, concepts like* “window”*, *“over”* and *“partition by”* won’t be a pain for you anymore.

There is a point when working with databases where you need some advanced analysis on your data. You can achieve complex analysis in many ways using SQL operators, and Analytics Functions came to extend the power of SQL making it possible to write in a few lines queries that otherwise would require a lot of self-joins and aggregations. This article is for you that have a basic to intermediate experience with SQL and want to understand straight to the point analytics functions.

## Scenarios

For our journey, let’s consider the following scenarios related to a list of students in a school categorized by classes and their sales on a local event.

**Scenarios:**

- The average age of all students.
- The average Sales by class.
- Compare each student’s sales to the average sales within its Class.
- The average student’s sales sum with its predecessor and successor by Class.

## Aggregate Functions

Before talking about Analytics Functions, consider for the first scenario we are calculating the average age of all students, it means, we are aggregating/combining a **group of rows** in a **single result.**

SELECT AVG(Age)as Average_age FROM Students=> 20.1

For this scenario the **group of rows** combined contains **all rows.**

The result is 20.1

The second scenario is very similar to the first one, except that the group of rows combined **won’t be all rows, but same class rows. **Then we’ll use the GROUP BY clause to aggregate the Students by class **before calculating the Sales average.**

`SELECT Class, `**AVG(Sales) **as Average_Sales_by_Class FROM Students **GROUP BY Class**

The result is:

`A: 80.0 (45+101+94)/3`

B: 108.75 (100+124+24+187)/4

C: 64(98+17+77)/3

For this scenario each group of rows combined (3 aggregations / 3 Classes) **results each an average output.**

Concluding, in a universe of *n* individuals (rows), each combination of grouping will result in a *single output value*. If you split the individuals in 79 groups, you’ll get 79 outputs, or if you just put all indivuduals in one group you’ll get 1 output as shown in our first scenario. When aggregating data you can apply several functions like: MAX, AVG, COUNT, SUM.

You can read more about aggregate functions in: https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions

**Analytics Functions**

Now that we understand what is and how works an aggregate function, let’s think about the third scenario: We need in the **same table** each individual student’s Sales and the average sales within its Class. This scenario **requires both non-aggregated and aggregated data. **“An **analytics function** returns a **single value for each row** by computing a function over a group of input rows.”

Using them you can achieve the same result as the aggregation in the second scenario, but **apply it for each individual row** instead of aggregating the result in three output rows, you can see the result in the left table. Having said all of that, let’s see how to write an analytics function in BigQuery.

That’s the analytics function *syntax *in Google Big Query, and we’ll modify it to get our desired query and result.

`FUNCTION(args) OVER (`

[ PARTITION BY partition_column ]

[ ORDER BY expression [{ ASC | DESC }] [, ...] ]

[ window_frame_clause ]

)

The FUNCTION could be an *average*, *sum*, *rank, *or other supported aggregate function *(*find more here). We’ll use the **AVG **function and **PARTITION BY **for the third scenario, we don’t need *ORDER BY* or *WINDOW FRAME. *We just need to partition the rows by Class before calculating the average and output it for **every single row.**

`SELECT`

Name,

Age,

Class,

Sales,

**AVG(Sales)** OVER (**PARTITION BY Class**) as Class_AVG

FROM

Students

ORDER BY

Class, Age ASC

The query above is enough to get the results we saw on the previous table. The **PARTITION BY Class** is basically saying: If the Student’s Class is “A” calculates the average of all “A’s Sales” and output it.

## Windowing

What if we need to calculate a **sum for each row**, and that sum **depends on the successor row value ?**

Analytics functions besides calculating aggregated data as shown in the previous scenario, can also do so *through a window*. A *window frame *that *moves* towards the rows and process an operation, as you can see in the following generic example that sums the current value with its successor.

Note that as the window moves, it sums the current and the successor number. Also be aware that the **element’s order interferes in the result. **That’s the basic idea of *windowing*, and you can configure multiple sizes of window.

Now, we can resolve the fourth scenario: “The average student’s sales sum with its predecessor and successor by Class”. You already know what is an *aggregate function*, *analytic function*, and *windowing. *Let’s put it all togheter and make an advanced query.

Remember the Analytic Fuction syntax:

`SUM/AVG OVER (`

[ PARTITION BY partition_column ]

[ ORDER BY expression [{ ASC | DESC }] [, ...] ]

[ window_frame_clause ]

)

The first piece PARTITION BY is basically saying “How the data will be grouped before calculate the function?” In our case we’ll use the Class column, and we can visualize the data as three groups (You’re familiar with it at this point)

Let’s replace the “PARTITION BY” clause

**AVG(Sales)** OVER (

**PARTITION BY Class**

[ ORDER BY expression [{ ASC | DESC }] [, ...] ]

[ window_frame_clause ]

)

**Attention: remember that the order affects the results when dealing with windowing.** In our hypothetical scenario we need to order the data inside each group based on Sales. That’s the **ORDER BY** clause on SQL.

**AVG(Sales**) OVER (

**PARTITION BY Class**

**ORDER BY Sales ASC**

[ window_frame_clause ]

)

Look, now we have almost all the pieces filled, we just need to define the *window_frame_clause*. Since the third scenario says “The average student’s sales sum with his predecessor and successor” we need to create a window t*hat will contain one predecessor and one successor row.*

`SELECT`

Name,

Age,

Class,

Sales,

**AVG(Sales)** OVER (

**PARTITION BY Class**

**ORDER BY Sales ASC**

**ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING**

)AS Sales_avg

FROM

Students

**Awsome!** Now the analytics function is complete. Attention on the windowing that is evidenced in the right.

The Janice’s **Sales_avg **is the average of her predecessor Sales: Amanda; 100, her own Sales ;124, and her successor; Sabrina, 187. Resulting in 137. That window is in dark red in the right of Partition B Data.

But what if we need a larger window frame that considers two elements preceding and two elements following instead of one? you can change the windowing configuration as:

`.. ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING ..`

## Concluding

Analytics functions are very usefull in many cases and can be customized with several options that aren’t described in this article, besides *aggregate functions* we have also *navigation functions* and *numbering functions* for BigQuery (find more here).

**I hope that this article helped you understand this complex feature of BigQuery, I always try to draw and make graphics in my articles to simplify the understanding.**

You can connect me via Gmail: **rodolfo.marcos07@gmail.com **or Linkedin: https://www.linkedin.com/in/rodolfo-marcos-41ab3198/

Thank you so much! Pardon my mistakes, I hope it will be usefull for you.