Welcome to the tidyverse...

Posted by Sarah Leichty

Ideas from Hadley Wickham’s talk at useR!2016 [“Tidy Tools for Data Science”]

Important information:

  1. 6 key tools: get data out of crazy original source and get it into R, get into tidy form (consistent and uniform organization, structuring data the same way), create summaries, visualization (can surprise you), modeling (can’t question their own assumptions), communicate findings in a way that people can follow
  2. Not just individual tools, but how they fit together. See the connections instead of each step as separate.
  3. “Pit of Success”- the thought that it will be easier to fall into success than trying to strive towards a goal
  4. tibble package - data frames that are lazy and surly, and better print method (only prints 10 rows) I like the humor which makes the presentation more interesting and engaging.
  5. R - environment for interactive data analysis more than strictly a programming language
  6. Uniform data structure - tidy (put data in data frame and put variables in columns)
  7. Tackle complex problems with simple uniform processes
  8. Impure functions: ones that change the world or the outputs don’t depend on the inputs, still important for data analysis, example: library
  9. The goal of the pipe (%>%) is to make it more readable
  10. referential transparent - you can replace any expression with a variable
  11. Write a function if you’re copying and pasting more than a few times

The future of data analysis

Create packages that all work together. This is Hadley Wickham’s goal, and I agree that this would make organizing and analyzing data much easier for R users when using different packages for data analysis. Personally, I only want to learn R in the hopes that it will help me effeciently tidy up data and be able to summarize and compare bit and pieces seamlessly. For an average student like myself, any technique for making R packages more similar and easier to learn together would be a huge benefit. Since R is more of an interactive data analysis platform, being able to explore what R can do without needing to think about if other R packages will work in conjuction with another would be a relief. I think the future of data analysis will be even more user-friendly with more interactive situations where learning can happen fluidly. As more and more non-programmers get into the coding world, helping make the transition easier will welcome new users in instead of discouraging them from venturing into a language such as R.