Reproducibility in practice

Posted by Rafael Martinez-Feria

As analysis of data gets ever more complex, extracting information and knowledge out of these is becoming increasingly difficult. As we discussed in the previous blogpost, literate programming and dynamic documents have been developed to address this, and help programmers and scientist be better analysts. However, these tools are still highly technical and mastering them requires a steep learning curve. Most scientists are more interested in understanding the science (e.g. mechanisms, processes, effects, trends) that the data is intended to model, and prefer investing time reading papers about their own line of research and not about coding.

"Let's solve these first. We can worry about data mining later."

We are not alone!

Modern dynamic document environments like R’s knitir and interactive notebooks like Jupyter are meant to bridge that gap and simplify the interactivity with dynamic documents. These have grown increasingly popular in data-heavy fields (think genomics, astronomy). But the fact that a journal like Nature is publishing about the use of dynamic documents such as interacting notebooks may be indication that reproducibility is becoming a priority in science. But let’s be honest. Scientific institutions (e.g. universities, journals, academies, etc.) are slow to change and adopt new ways of thinking.

“A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.” - Max Planck

But it is up to the new generation of scientist and researchers to adopt and promote the use of these new tools, and foster transparency and collective thinking to solve the world’s complex problems.