Imagine you are tasked with analyzing the efficiency of a factory. You are provided with two data sets. The first lists the individual shifts and expected output, the second lists the actual output. How would you conduct your analysis?
This was a challenge I recently faced while working on a new project for a client and I thought I'd share how I tackled it.
The approach I took can be used in any scenario where you have two data sets, with one listing expected results, and the second containing the actual results.
Union to the rescue
In my first attempt to tackle this challenge I tried to join the two data sets across a common id. In my case it was shift id. This approach was however flawed because it wasn't always present.
I think if your data set does contain a common id across all rows in both data sets then a left join would work. In that case you would left join actual results against expected results. This is assuming your expected results data set covers all possible time periods where actual results could take place.
In my project I had to do a union. I stacked the expected results and actual results on top of each other. This way I could count the actual output and divide by the expected output.
I used Tableau which allowed me to easily create new fields which acted as "master" fields. Since both of my data sets were excel files this was the easiest way. You could do some data prep and rename a bunch of fields before loading but I decided to load everything as is and then add new fields as necessary.
In the screenshot above you can see an example of a "master" field I created . I used the isnull function in Tableau to test if a field was null and if it was then the value would be taken from the corresponding field. This is a way to "stack" the data from two columns which have different header names.
A useful field to include in your union
Whenever you union data sets it's helpful to include a descriptive custom field which allows you to easily isolate a piece of the union.
If for example you were doing a union of "data set A" and "data set B" and the columns in both sets were identical, how would you filter the data from set A if you needed to?
Your SQL might look something like this:
SELECT
shift id,
date,
team
FROM
data_set_A
UNION ALL
SELECT
shift id,
date,
team
FROM
data_set_B
Instead of the above I would suggest doing the following:
SELECT
shift id,
date,
team,
'data_set_A' as data_category
FROM
data_set_A
UNION ALL
SELECT
shift id,
date,
team,
'data_set_B' as data_category
FROM
data_set_B
This way you can easily filter by the relevant data set whenever you need.
Thankfully Tableau automatically adds such a field whenever you create a union. This allows you to know which table of data is associated with each row in the unionized data set.
I hope you found this post helpful and if you have a similar challenge and need some help, don't hesitate to post your question in Slack so the community can help you out.