Posted by Ohene Akuoko
What is tidy data and how does it relate to the data modeling topics that we have already discussed in this class?
Tidy data is a standard or frameork that decreases the complexity and difficulty of handling large and/or unorganized (messy) datasets. It provides specific structures and tools that allows simpler data manipulation, analysis and modeling. Tidy data, has many correlations to what we’ve been discussing since essentially the beginning of the course. One of the underlying basis reasons for tidy data is to enhance data Manipulation and reproducability, Which have been covered widely in this course.Tidy data also highlights properly ordering and labeling variables and columns, along with the handling of missing values when needed.
What are the benefits to keeping data tidy?
Tidy datasets overall, improves the effectivness of and ease of use of data.The emphasis on structure and organization allows a user to make sense of even raw datasets. One would be able to retrieve and update datasets more easily and effectively, while increasing the reproducability and speed of data anlaysis. The article also states that the “tidy data” framework is well suited for one of the more popular programming languages like R and other vectorized programs, which is a benefit to the data scientist community.
Are there any drawbacks?
It doesn’t seem like there are many if any. The cleaning and organizing of most anything always seems like a smart thing to do, especially in light of the multiple data mangement benefits mentioned. Though if I had to state one flaw, it is that it may not benefit those using non vectorized programming languages.