Tidy data

Posted by Jordan Kersey

What is this “tidy” data you speak of?

Tidy data is formating data so it’s ready for analysis! Essentially, it helps to prepare a data set for analysis by organizing it and refining it to an extent which enhances the ease of many components of analysis.

Tidy data means that 1. Each variable forms a column 2. Each observation forms a row 3. Each type of observational unit forms a table.

also known as…

Normal form!

Tidy data helps to allow extract certain pieces of data with ease, and without confusion (of the program or human), and without struggling to only grab pieces you desire to, and exclude ones you want to.

So… why should I keep my data tidy, I keep my garage messy and it seems fine…

Wrong! How many times have you attempted to find something in the garage only to hopelessly conclude its not there, go out and replace the object, only to find it within a few days?

Keeping a tidy (garage) or data can relieve the stress of entities getting “lost” because of lack of organization. This could be very useful if you have large sets of data, which could be easily jumbled.

Benefits also include, appearing to be an awesome data steward who always knows what’s going on with their values, and always has a clue how to piece things together, and the ability to help other’s tidy their data-showing off the skills learned in AGRON590!

One drawback may be that tidy data tools only work with tidy data, so one must make the effort to put all data in the form it can be most easily manipulated. Change may be hard, but it’s necessary for improvement.