What is Reproducible Research and why it is important.

“A community involvement in actively maintaining reproducibility of previously published results assures that a body of knowledge in a computational field stays alive and can be scientifically extended though continuing contributions.”

–Sergey Fomel

“It is a big chore for one researcher to reproduce the analysis and computational results of another. I discovered that this problem has a simple technological solution: illustrations (figures) in a technical document are made by programs and command scripts that along with required data should be linked to the document itself.”

–Jon Claerbout

I’ve got little research experience particularly with large-scale computationally-based project until I attended the second Madagascar Working Workshop held by “The Rice Inversion Project” this summer in Houston, where I also first knew about an old but also new concept in natural science even recently in social science area– Reproducible Research.

So what is it, how has it been raising researchers’ and professionals’ interest, and why it is of increasing importance?

As the word “Reproducibility” denotes, it is the ability of an entire experiment or study to be reproduced, either by the researcher or by someone else working independently.(Wikipedia-Reproducibility). To put it more vividly, I believe “Reproducibility” is like the recipe you attached to your freshly-made-creative-savoury dish with which people can reproduce the dish at home and even add some new ingredients or spices as time flies. Then Reproducible Research is naturally one where researchers in particularly but not restricted to the field of computational science deliberately “serve” for the scientific community to provide the complete development environment and the complete set of instructions.

How did reproducibility raise professionals concerns? I want to mention a recent event in academia in biology, and also a situation I just run into this afternoon with my mentor Sona in our small computational research project about wave equation.

140806_SCI_Riken.jpg.CROP.promo-mediumlarge

January 2014, two papers, whose lead author, Haruko Obokata, a young and beautiful Japanese female researcher in her early thirties, about a simple new method for creating stem cells were published in the prestigious journal Nature to much fanfare. It suddenly made a stir in media in Japan and beyond. However, these two papers were retracted in July after another researcher based in UC Berkeley concluding that her studies are dubious with the fact that his team had tried so many times but were still unable to reproduce the results. We can see the paramount importance of reproducibility in disciplines like biology, physics, mathematics whose quality of detailed deriving process determines the persuasion of authors’ achievements. In terms of scientific computation or the field of software development, both of which underline the final results and figures rather than the process, however, the significance of the implementation of reproducible-research frameworks stems from, first of all, the necessity for long-term maintenance of reproducible results. “Simple storage of software codes is useful but insufficient, because typical scientific codes have multiple dependencies( libraries, compliers, operating systems, etc.). With time, different parts of the software environment change and cause the reproducibility of previously published results to break down.” (“Reproducible research as a community effort: Lessons form the Madagascar project”–Sergey Fomel). Further, situations where researchers and publishers’ colleagues question part of their paper and look for backup details from them are too common and familiar to many of the professionals in the field, and their review and tracing back the right set of codes used in that paper and not to mention the right set of parameter they select to output the nice results takes these researchers a fairly big amount of time and thus decrease their research productivity or academic competitiveness. This is so true even for an undergraduate student who is doing her own independent project like me. This afternoon, I was debugging the codes my mentor sent me to refer to which can make a movie in 2-D about a stable moving wave. It’s a matlab code and I tried many times but still could not find what’s going wrong. I decided to check with google one command by one command. After tedious repetition of “copy and paste” on google I finally found out that it is because one of the file “avifile” has been removed from Matlab and I shall use “VideoWriter” class instead. My personal experience exemplifies that inevitable changes in the software environment easily cause breakdown and without dedicated and continuous maintenance, the computational results easily loose their credibility and practicability.

At the end of this blog, I’d like to provide two very useful and detailed website which people interested in the specific tools for implementation of Reproducibility can refer to afterwards:

http://reproducibleresearch.net/events/

http://www.ahay.org/wiki/Houston_2014 You’re also welcome to join the Madagascar open community, a great source comprised of three levels: programs, workflow scripts, and papers.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s