– Second in a series for the 2019 Data VizArt Student Challenge –
Last time we mentioned where to find the details of the contest (http://www.dvastudentchallenge.ca/), how to load a free copy of Tableau to your computer, and where there are some nice tutorials on the software. Today, we want to talk about data cleansing and then take an overall look of Tableau’s visual capabilities.
Data cleansing is required when data does not seem to fit into the fields that have been given. When I was younger, I always assumed data quality would get better over time. The amount of data has gotten huge, but the quality is still a problem. A few years ago, pre-rel8ed.to, in one of my assignments the data scientists told me that 40%+ of their time is spent on cleansing data. This for a structured dataset. That is a lot of time. Why is this? Well, there are tons of reasons, but the fact remains that the analyst has to ensure the data is what it really is.
For example, in this spreadsheet, we have put FSA in column A (see the data dictionary tab for details on this file). The FSA is the first 3 digits of the postal code. FSA stands for forward sortation area and is useful in sorting mail for delivery. Each FSA represents a geographic area. For Canada, the FSA starts with a letter, is followed by a number, and is then followed by another letter. In incorrect FSA would be one that has all letters, or numbers, or even one that doesn’t exist but that is in the proper format. You will notice column B is whole numbers while C & D are percent. Lastly, column E is the average age of each business (in years – see the data dictionary). If the average age was over 100 years or the % of female directors was over 100%, you would assume (correctly) that the value in that cell is suspect.
What do you do if you find issues in data? You can either make a correction or delete the observation. It really depends on what you are trying to find. If you do find any in any of the data files for this contest, let us know on this blog. However, we all have been extremely careful and hope you don’t find many issues.
Tableau’s Visual Capabilities
Tableau can do a lot of things. Today, we want to also focus on what some of the overall visuals look like. One can make a boring chart and one can make something that draws the reader in. The idea of this contest is the latter.
People can do interesting things with Tableau. For example, go to public.tableau.com and click on the magnifying glass in the upper right corner. Search for “Edmonton Oilers” (without the quotes). Click on Corey Sznajder shots and passes file. Note that there are 4 different ‘tabs’ to view. Or, search on Bitcoin and find Mehmet Bayburt’s page. There is just one ‘tab’ but there are 4 different charts on the page; each telling a different story.
rel8ed.to has created many Tableaus; a few of them are found here (https://public.tableau.com/profile/rel8ed.to#!/). We utilize maps and bar charts, and many other visuals… some of which we will get into the coming weeks.
But the best thing to consider for new contestants would be last year’s winning entry, shown below. Second place can be found here: https://public.tableau.com/profile/pranay.bhatt#!/vizhome/DeepInsights_Pranay_Raman/FinalPresentation
Stay Tuned for more Pro-Tips
Watch for our next DVA-helper blog early next week. We will be discussing in more detail how to join the datasets together. In the meantime, make sure you are registered and have a solid team put together. Start playing with some of the data (try just the business data) and discover the capabilities of Tableau. Toodleoo!Tags: 2018, 2019, CIBC, contest, data cleansing, Data VizArt Student Challenge, Deloitte, DVA, manifold data mining, Tableau
Categorised in: Big+Open Data, Business Data Analysis, Data Scientist, Data Visualization, Tableau
This post was written by Drew Fones