Tableau 101: Data Preparation with Tableau Prep

By: CoEnterprise | August 2, 2018

Tableau Prep is a new visual data preparation tool that integrates with Tableau Desktop. Prep makes it easy to clean your data and to complete many data transformations that would be tedious or even impossible in Desktop or in the data source. As a self-proclaimed sports nerd, when I have free time, I like to dive into sports data sets. When the NFL Draft season arrived, I couldn’t help but to look at some NFL Draft data to analyze the incoming draft class against active NFL players. With so much talk about the NFL Draft this year, I decided to look at the money being paid to each player being drafted.

In the NFL, draft position determines the value of first year contracts. This means that a team with the number one overall pick will pay the same salary to any player they select, whether he is a quarterback (typically thought of as the most important position on the field) or a punter (a position that gets on the field about five plays per game). By comparing the value of rookie contracts to those of existing players I hope to see the positions where each team might create value with their draft selection. Given each team’s set of picks, where could they potentially replace a player on a market-priced contract with a similarly productive player on a rookie contract?

The draft data is drawn from a few different sources, including Pro Football Reference and SporTrac. Some of the dimensions in the data sources contain slight differences (e.g. Odell Beckham Jr. vs. Odell Beckham). Though small, these discrepancies would generate mismatches if we simply joined the data in Desktop. Correcting them all manually would take hours. Enter Tableau Prep.

After opening Tableau Prep, I connect to the Excel workbook containing my data and drag the first table onto the canvas to get started. This creates an Input step. Tableau Prep’s Clean step allows me to change field names, filter out or replace values, split columns, and group data by selecting a column in the profile pane, then selecting the action I want to perform from the list at the top of the pane.

TableauNFL1

As you can see at the top of the workspace above, the cleaning function is represented by a new step connected to my data source. The resulting diagram shows the workflow I’m creating to transform and join my data. In my flow, I used the clean step to group up mismatched positions in SporTrac’s 2018 player salary data. For example, some players were listed as “LB” (linebacker) when they should have been listed as “ILB” (Inside Linebacker), which is a more specific position. The clean step also works like the transformation functions that can be performed on a data source on Tableau Desktop’s Data Source page. I was able to split the “Player” field from Pro Football Reference, which was comprised of the player’s name and a player ID, into two fields.

The Aggregate step in Tableau Prep allows the user to aggregate and group fields by simply dragging the desired columns to the “Grouped Fields” or “Aggregated Fields” bins within the Tableau Prep profile pane. The Aggregate step can be useful for analyzing data at a higher level. In this case, I gathered a data source from Pro Football Reference that contained game-by-game results for the 2017 NFL season, but I wanted to use the data at the total team level. By using the Aggregate step, I was able to group and aggregate the data to get stats like total wins and losses, yards gained, and yards allowed for each team.

TableauNFL2

TableauNFL3

Once my data sources are cleaned and grouped, Tableau Prep also makes it easy to union or join them together. I have two data sets from SporTrac, one containing rookie contracts and another with veteran contract information. To union them, I simply drag the newly added table onto the Clean step from the other data source and select “Union.” Tableau Prep’s profile pane summarizes the results, so I can quickly see whether the union worked the way I expected. If not, several functions allow me to troubleshoot and correct the connection. For example, a checkbox lets me show only the mismatched fields, so I can identify any fields that should be combined in the union.

TableauNFL4

Tableau Prep makes the join process easy as well. As in the data source window in Tableau Desktop, Prep illustrates the type of join with a Venn diagram and gives a preview of the resulting table. Tableau Prep goes several steps further though. It recommends likely join clauses. It also provides a visual analysis of the join results, showing how many rows have been matched from each table and any that have been excluded. I can then highlight records that do not have a match for each join clause selected, which makes the process of creating, understanding, and systematically troubleshooting joins simple.

I used a couple different joins in my flow. I joined the salary data with the approximate value data from Pro Football Reference. I then joined that data with the 2017 NFL team schedule data that I transformed to get teams’ win/loss record. Again, Tableau Prep shows the whole process in an interactive diagram that makes creating and checking the output from each step straightforward.

TableauNFL5

Once a data source is ready, Tableau Prep provides a lot of flexibility in how it can be used. Tableau Prep lets users output to multiple file types and produce outputs at any point in the workflow. A single Tableau Prep Flow can create several data sources. Create output files in Hyper, TDE, or CSV formats, or all three. A Hyper file allows the user to take advantage of the hyper data engine in Tableau 10.5 and above, and a TDE will work in any version of Tableau Desktop prior to 10.5. Exporting to a CSV file gives the user flexibility to work with the data on other platforms as well.

TableauNFL6

The flexibility and simplicity of Tableau Prep made my exploration of NFL Draft data much easier than it would have been without it. I went on to build a dashboard that shows how the salary of a rookie drafted in a particular round of the draft compares to those who already play that position. Quarterbacks are among the highest paid players in the league (averaging $6.0M annual salary) and running backs are among the lowest paid (averaging $1.5M annually). Since the rookie pay scale is determined by draft position, drafting a quarterback early tends to be a low risk pick. A quarterback selected as the number one overall pick would come into the league as only the 32nd highest paid quarterback. A running back drafted at the same spot would become the 4th highest paid running back in the league.

Handing a quarterback that contract seems like a bargain, while allocating the same money to a running back looks like a much riskier hit to a team’s salary cap. If the running back does not live up to expectations, the team is stuck paying him a lot of money (compared to others at the same position) for little production. This discrepancy in relative value, suggests that, for the New York Giants, a safe play would have been to draft Sam Darnold or another quarterback who was taken in the top 10 of the 2018 NFL draft. By comparison, selecting Running Back Saquon Barkley with the number two overall draft pick was a big bet.

To learn more about Tableau Prep, be sure to check out Episode 8 of the InSync Podcast.

Want to see how you can take your Tableau projects to the next level? Learn more about how our experts can help grow your skills today.

Subscribe to our resources!

Sign up to receive our latest eBooks, webinars, blog posts, newsletter, event invitations, and much more.