Cleaning Walmart coffee listings from 500 stores dataset

Today I decided to test a CSV file that I am trying to import but it contains some errors. I downloaded the Walmart coffee listings from 500 stores dataset from Kaggle. I want to test my skills of cleaning the data and exploring some interesting questions about coffee sales and prices.
After I loaded the data into Power BI, I noticed that the weight column formatted in grams had some errors.

Some of the cells had the letter A appended to the numeric value, which made them invalid. Instead of removing the errors, I wanted to locate them and replace them with appropriate values. This column is important because it shows how much coffee each listing offers and how it affects the price. For example, I wanted to see if there is a correlation between weight and price, or if some brands offer more value for money than others.

I found out that the error was the letter A being added to some of the cells randomly. I couldn’t figure out how to convert that letter A to a number, but I was able to use the replace errors function to change the error with the number 0, and this way keep the row and fill the cell with the 0 value. This was not a perfect solution, but it allowed me to keep the rest of the information in the row and avoid losing data. I also checked if there were any other errors in the other columns, such as missing values or outliers, and fixed them accordingly.

It was a fun exercise to learn new functions and how to handle data without losing it. I’m looking forward to working on another project soon, maybe using a different dataset or a different tool. I think data cleaning is an essential skill for any data analyst or scientist, and it can also be enjoyable if you have a clear goal and a curious mind.

Leave a Comment

Your email address will not be published. Required fields are marked *