SAC it to me!

Posted by Kelsie Ferin

Blog Post #7:

For this week’s blog post, we read Hadley Wickham’s “The split-apply-combine strategy for data analysis” and answered the following questions:

  1. What is the split-apply-combine strategy and how is it used in the process of data analysis? The split-apply-combine strategy is the process of breaking up “the problem” or data into manageable pieces, working on each piece individually, and then combining all of the pieces back together. As mentioned by Wickham, this can be used and is very helpful during data preparation, while analyzing your data, and during modelling or when trying to fitting separate models to different components of your data.

  2. Where have we already seen/used the split-apply-combine strategy in this class? We have used this split-apply-combine strategy during our lectures on using SQLite databases. We were taught to look at the data set as a whole, and then split up the data and analyze it separately using “joins” and then combining the data back together to get the whole picture. Another instance where we used this strategy in class would be in our last lecture talking about cleaning up weather data. We were downsizing our data to look at only 4 columns and then analyze and fix them, and then I am assuming we would combine all of the data back together if we had more time in class.

  3. What are some advantages of using the split-apply-combine strategy? One advantage that was mentioned is that you are able to see how some problems may be connected to other aspects of your data, and if you were not using the split-apply-combine strategy then you could miss this and this could become a huge problem in the long run. Another advantage is by following this strategy, this will allow you to become better at solving common data analysis problems and help you teach others to do the same.