Data Modeling in Tableau: Part IIBy: Adam Nathan | July 22nd, 2020
In part one of this blog post, we examined the features of the new data modeling in Tableau 2020.2. If you haven’t already, you may want to check it out to get an overview of the changes before starting this one. With that foundation on how the new data model works and the advantages that come along with it, we will examine a use case that illustrates the power of the new features. By the end of this post you will have a better understanding of how the new data model will affect your Tableau strategy going forward.
See It in Action
In this demonstration, we will compare how the old data model would handle a use case versus the new data model. We are using the BetaBookShop dataset from Tableau which was provided as part of the beta release. This data set consists of 13 tables appropriate to a bookseller managing their inventory of titles. The concept for this dataset is that a book has attributes listed in other tables such as an author, title, and genre. The question we want to answer is:
- What is the average number of pages per publishing house?
In the old model, the first thing we do is bring in all the tables we will need and define the joins between them. We will assume the current use case is part of a larger project that will ultimately require all the tables shown.
Notice how with each new table we add, Tableau creates an inner join to a previously added table by default. This will be important later in the demo. Now that we have the tables joined together, we can build the visualization. We bring publishing house and the pages measure to the view and make it an average calculation. It’s that easy, right?
Not really. After further analysis, we find that these are not the correct numbers at all. We do some investigation and find that the number of records is being duplicated for each ISBN. This is due to the Sales table being joined to the Edition table, creating a new record for each sale of an ISBN, which in turn, increases the apparent number of pages for each publishing house.
This problem is easily solved with a Level of Detail calculation (LOD). We can create a fixed calculation so that it only returns the max number of pages for each ISBN and place that into our view.
The average number of pages per house appears to be accurate now. Let’s switch to the new data model.
We start by bringing in the same data sources as before but in 2020.2 we notice the two layers discussed previously:
A Logical layer created automatically with Noodles…
…and a Physical layer for the “Book” data source, which is opened by double-clicking on that object’s logical layer. Here is where any joins would appear, as they did in the old model.
With the new process, Tableau recognizes common fields between the tables and establishes those as relationships. For instance, the Book table and the Edition table both contain the field “Book ID.” Therefore, the relationship will be created on this field, as you see in the picture below. In the old data model the fields would have no relation to each other unless it was specifically joined on that dimension.
Now we can build out the visualization. The first change to point out is on the data ribbon, where our dimensions and measures are located. In the new data model, all dimensions and measures are now located under each table. A new measure shows the count of the records in each table, rather than the total number of records in the data source, like in the old model.
We drag the publishing house and the average pages measure to the view, exactly as we did in the old version, but this time the numbers are immediately correct – no need for an LOD!
Old Version with LOD:
That was much easier than before, but notice that the “Etaoin Shrdlu Press” publishing house numbers do not match between the two different versions. We do some investigation and find that in the new data model we have a null value for ISBN in the sales table as a result of not matching with an ISBN in the book table for a BookID.
Remember earlier on when I noted that Tableau defaulted to inner joins in the old version? When we look at the old data model, we find that particular BookID (CC830), is missing because there are no matching ISBNs between the two tables, thus omitting the records. Assuming this is an error in the data source we would have had to do extensive research to even notice, much less understand why our pages per publishing house were wrong for just that one publishing house. With the new data model, Tableau recognized what we were trying to find based on the dimensions and measures we used in the view, and it created a left join on the data sources to find the correct answer. When we switch the join to a left join in the old data model, we then find the matching answers.
Old Version with LOD and left join:
As you can see, finding the correct solution with the new data model did not require creating an LOD or even a good understanding of how the tables are structured in the data source, as was required with the old data model. This saves our time and energy for more important things, like building beautiful vizzes!
The new data model is a complete game changer for Tableau users. In this use case we saw how relationships open a lot of flexibility for our data sources while also limiting errors that could have been made when creating a traditional join or level of detail calculation.
To read more about this topic and download the beta, please visit the Tableau official website.
Want to see how you can take your Tableau projects to the next level? Learn more about how our experts can help grow your skills today.
Subscribe to our resources!
Sign up to receive our latest eBooks, webinars, blog posts, newsletter, event invitations, and much more.