As many of us have started working on cleaning our data for our project work, I thought of sharing a few tips and tricks that I came across for cleaning your data sets while using Tableau.
- Get involved with your data
After you have identified the final argument and the different claims that support your argument, the next step would be to understand your data. It may not be sufficient to just look at the column headers, it’s a good idea to think through what the data represents. Check for the data types, values that each column can take if they are within the expected range (an e.g. price range of fruits per kg in a grocery store from $0 to $40.) Be vary of empty or null values. (E.g. in our first assignment the null values for x Coordinate, y coordinate, latitude and longitude did result in a key insight). You can also try to spot initial patterns in the data set.
- Never trust your data at the first sight
It may so happen that the first set of 50 -100 rows of your data may be well formatted however there may be errors in the rest of the rows making it difficult to visualize your data. Also, it’s always better to double check your understanding of the data so that you don’t make wrong assumptions. This will save a lot of initial data prep time.
- Avoid cleaning your data manually
It’s always better to use the in-built Tableau Data Interpreter for cleaning messy data sets. It’s an easy way to strip out title, footnotes, empty cells and multi-row column headers and create a usable table. The data interpreter is also very useful in extracting sub tables from excel files. i.e. when multiple table are place on the same sheet and separated by a empty spacing in between.
- Standardize your data
Use the same naming convention for column headers across all your data sets. For example, if the same column appears in multiple data sets if you standardize it’s easier to remember the column name. This may apply to the data values as well. For e.g. CA and California are the same but may not be recognized in Tableau. It’s a good practice to group these values. Try to use the same unit for measures that you want to aggregate by applying a calculated field. (for e.g. total number of items sold per category and profit per category in a retail store should both be an integer type)

- Iterate your data cleaning process
Try to focus on the main issue blockers in the first iteration and start your first data visualization. Based on the insights you get, you can apply better data quality techniques to refine your dashboard to sell your story.
Reference: https://public.tableau.com/en-us/s/blog/2016/05/5-tips-cure-your-data-cleaning-headaches
Very well organized and described. Helpful for our future projects.
Thank you
Very good tips and tricks to use. Though I feel like you can just manually clean the data if it is not that large.
thank you!
Very useful information!