Lego my manuscript

Posted by Jared Flater

Reproducible Research and Dynamic Documents

It seems that over the last few years there has been an increased, albeit warranted, focus on reproducible work in academia. There have been a few bad eggs lately that have threatened the image of peer-reviews scientific literature, namely in experiments that are not reproducible. One approach to this problem is to make all data and pertinent information available to your peers, by sharing your raw data and supplying the code that you used to manipulate it, so that they can interdependently re-create your study.

Dynamic documents may help us move toward reproducible research by allowing readers/reviewers to verify findings, use the same data for alternative analyses, squash uninformed reviewer comments with existing data and stimulate the interchange of ideas. To facilitate this, there are several principles that must be followed in creating this document, or compendium as we are calling it.

Principles of Reproducible Research -> Dynamic Document

  1. Data and algorithms used to create the publication must be available so that they may be replicated.
  2. If, for practical reasons, some data may not be included. This data must be free and available for access. This should be Tidy data too.
  3. Code and the entire history of computations should be available
  4. Transformations, creating documents that can be re-created to take advantage of different media formats. I feel that we have really dove into this issue in our class with our focus on R markdown and collaborations and GitHub use. By using R markdown, we are able to create rich narratives that include data and various levels of code. This allows any one of use to directly reproduce the results achieved and we in fact do this nearly everyday in class…just not on the same scale as a publication.

My current lab (GERMSLAB) is highly data driven and computationally intense, therefore, I see a strong application here for creating a strong foundation for protocols in our lab. By creating SOP in reproducible documents, future students should be able to re-create anything that Phil or I have done for the GERMSLAB.

An aside: I keep thinking of legos when we are talking about reproducible research. I don’t know how many times as a kid I was drawn in by the picture of an amazing Lego structure on the box of Legos at the store. Much like a shiny new and hot manuscript. But then you start to dig into the pile of Legos(data) and you can’t really see how they went from the pile to the castle. This is where instructions(compendium) comes in. Now you can re create the structure as the original designer intended. Or you can make minor tweaks to achieve a different outcome.

Science needs more instructions.