Skip to Main Content

The 10 marine science (rd) Things: Thing 10

Thing 10: Dirty data

Dig in to dirty data.  What is it?  Why should we care? Try your hand at using an open source data cleansing tool.

Activity 1: Dirty data

Why is ”clean” data important? Public policy, changes to medical protocols and economic decisions all depend on accurate and complete data. Thing 10 looks at the why and what of “dirty data.”

1. Browse down the Bad Data Guide list of commonly encountered data quality issues (with possible solutions). This is aimed at journalists but it shows who is responsible for cleaning up dirty data.

Click into a few of the causes and solutions to dirty data - many of us contribute information to reports or do our home accounts in spreadsheets, and maybe it’s time to think about how clean our own data is!

2. For a quick guide to working with spreadsheets, check out one of the School of Data’s Data Fundamentals course (A Gentle Introduction to Data Cleaning).

Consider: What are the wide ranging implications of how dirty data can impact your research?