Tableau 101: AggregationBy: Greg Herzing | April 3, 2023
Aggregation is simply summarizing a list of numbers, often recorded in many rows of data, with a single number. Aggregation enables you to answer targeted questions. For example, do you want to know your average number of products sold per retail store? Or the average value of sales across each sales office? Aggregation is the key to answering these and many other questions with data.
Every day your business generates rows upon rows of data, anywhere from hundreds to hundreds of thousands or even millions. You can use this data to answer important business questions, but only if you can summarize the potentially millions of rows of data you have floating around your database.
Level 100: Tableau Aggregate This
Aggregation is a core process of many data storage and visualization solutions. Examples of aggregation include SUM, AVG, COUNT, and others. You should be familiar with these functions if you have ever written a Structured Query Language (SQL) query.
You can only truly gain insight from aggregated data. How insightful is this:
Ready to make critical decisions? Didn’t think so. How about this:
Data becomes a useful asset when summarized, aggregated, and visualized.
In Tableau, aggregation is applied in the data source. Tableau Desktop sends a query to the data source, which then responds with the aggregated measure or dimension. A million records can become one simple number with a single click. See the image below:
You may be wondering how this process works with your Excel sheets. Excel is many things, but it is not a data-crunching engine. Tableau does not send a query to Excel In the case of flat files and extracts. Instead, Tableau applies aggregation in its data engine where flat files and extracts are processed. This is just a technical detail; the end result you see is the same whether using a database or a flat file.
Aggregation is critical to asking effective questions and producing reliable answers to them. Aggregation has a big impact on the way measures are represented, dimensions are displayed, and calculations return results.
Level 101: Tableau Aggregate Measures
In Tableau, aggregation is applied to a measure by default. When you drag a measure into the view, Tableau queries the data source which then responds with the aggregated results. This makes Tableau more efficient and your insights clearer.
Once you drag a measure into the view, you see the highest level aggregation for the measure. So SUM of Sales is showing the SUM of all rows of sales data in the entire table. This allows you to ask deeper questions. If you want to see SUM of sales by State, you can drag in the State dimension to go deeper.
Keep in mind that certain types of aggregation will give different and potentially misleading results. For example, using COUNT to find out how many orders your customers have placed could be misleading if you have multiple rows for each order. In this case, you will want to use COUNT (Distinct) to get an accurate number of orders.
We can build an aggregation example using Rolling Stone’s Top 500 Albums of All Time on data.world. For this example, we are asking how many albums each artist has on Rolling Stone’s list.
- Connect to the albumlist table and click on Sheet 1.
- Name the sheet Albums by Artist.
- Drag Number of Records to Columns and Artist to Rows.
You will see that Number of Records is now aggregated to SUM. Tableau uses SUM because that aggregation is the default for the field. To change the default, right-click on the field in the Data pane and go to Default Properties > Aggregation.
Level 201: Tableau Aggregate Dimensions
Measures are aggregated by default and are important to gaining insight, but aggregation can also be applied to dimensions. We can see this in two examples using our album data. In the first example, we will simply see how many Top 500 albums were released each year and find which year had the most albums. In the second example, we find the date of first release for each artist.
- Create a new sheet and name it Albums Per Year.
- Drag the Album field to rows. This will create a list of albums for the whole data source. Note: Because Album is a discrete dimension, we see a list of Album headings. This is not what we want.
- Right-click on the Album field on the Rows shelf. Tableau will show a menu.
- Select Measure and you will see a list of aggregations. Choose Count(Distinct).
- Drag Year to Columns. You will now see a line chart. The year 1970 should have the most albums.
By aggregating the Album dimension, we can now get a count of albums without a separate field of data.
For our second example, we will find the first album date for each artist.
- Create a new sheet and name it First Album Date.
- Right-click on the Year field and convert the data type to Date.
- Drag the Year field to Text.
- Right-click on the Year field on Text, hover over Measure, and select Minimum aggregation. You will now see a list of artists and the year of release for their first album on the list.
Both of these examples show how to apply aggregation to a dimension. The first example shows how to convert a dimension to a measure by applying aggregation, and the second shows how to use aggregation to get new insight into a dimension’s values.
Level 301: Tableau Aggregate Calculations
Aggregation can be added to individual fields and is also critical to calculations. Calculated fields can use aggregation to answer different questions. There are two general types of calculated fields:
- Row-Level – Calculation is calculated for each row in the data. Row-level results are then aggregated just like normal fields.
- Aggregate – Measures are first aggregated, and the calculation is then run on the aggregated result.
Choosing which type of calculation to use can mean the difference between drawing valid, useful conclusions and completely unusable ones.
Another consideration is performance. Calculating at the row level requires Tableau to make a database query to read each cell, perform the calculation for each cell, and then aggregate the results. Aggregate calculations use Tableau’s precalculated totals for each aggregated field, meaning these calculations can be processed faster.
To illustrate potential issues, we will use a classic example in Sample – Superstore.
Let’s calculate a profit ratio for the product lines in our Superstore data set. We need to calculate the profit ratio, then break it down for each product Sub-Category.
- Create a new sheet and name it Profit Ratio.
- Build the viz below by dragging Sales to Text, Sub-Category to Rows, and Profit to the view directly over the Sales numbers. You should see a visual cue labeled Show Me.
- Open the Calculated Field editor to create a calculated field.
- Write the calculation: [Profit]/[Sales]. Name the field Row Level Profit Ratio.
- Select OK.
- Drag the field to the view (over the Sales and Profit numbers) and drop it into the view. You will now see another column of numbers.
- Format these numbers as a percentage. Stare in shock at a 60000% profit ratio. Something is wrong.
- Note: The problem is that the calculation is a row level calculation. You are calculating a row level profit ratio for thousands of rows of data. You are then aggregating these row level results into an insanely large decimal number. Let’s fix this.
- Open the calculated field editor and create a field called Aggregate Profit Ratio. Write the calculation: SUM([Profit])/SUM([Sales])
- Drag the field to the view and drop it in. You will now see another column of numbers.
- Format these numbers as a percentage. Sigh in relief as you see numbers which make sense.
Aggregation is important even within a calculation. Row level and Tableau aggregate calculations allow you to answer different questions and work with different types of data.
Successful dashboard developers know how to use aggregation effectively to answer questions and discover insights. You can aggregate a measure to summarize many rows of data into a single number. You can apply aggregation to a dimension to turn a slicer into an aggregated number. Finally, you can use aggregation within a calculated field to answer deeper questions.
To learn about more efficient ways to apply aggregation, check out our previous Tableau 101: Clicks blog post.
Heading to Tableau Conference 2023 this year? Be sure to sign up for CoEnterprise updates throughout the event.
Subscribe to our resources!
Sign up to receive our latest eBooks, webinars, blog posts, newsletter, event invitations, and much more.