Reproducible Bioconductor workflows using browser-based interactive notebooks and containers

Abstract Objective Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocI...

Full description

Saved in:
Bibliographic Details
Published inJournal of the American Medical Informatics Association : JAMIA Vol. 25; no. 1; pp. 4 - 12
Main Authors Almugbel, Reem, Hung, Ling-Hong, Hu, Jiaming, Almutairy, Abeer, Ortogero, Nicole, Tamta, Yashaswi, Yeung, Ka Yee
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.01.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Objective Bioinformatics publications typically include complex software workflows that are difficult to describe in a manuscript. We describe and demonstrate the use of interactive software notebooks to document and distribute bioinformatics research. We provide a user-friendly tool, BiocImageBuilder, that allows users to easily distribute their bioinformatics protocols through interactive notebooks uploaded to either a GitHub repository or a private server. Materials and methods We present four different interactive Jupyter notebooks using R and Bioconductor workflows to infer differential gene expression, analyze cross-platform datasets, process RNA-seq data and KinomeScan data. These interactive notebooks are available on GitHub. The analytical results can be viewed in a browser. Most importantly, the software contents can be executed and modified. This is accomplished using Binder, which runs the notebook inside software containers, thus avoiding the need to install any software and ensuring reproducibility. All the notebooks were produced using custom files generated by BiocImageBuilder. Results BiocImageBuilder facilitates the publication of workflows with a point-and-click user interface. We demonstrate that interactive notebooks can be used to disseminate a wide range of bioinformatics analyses. The use of software containers to mirror the original software environment ensures reproducibility of results. Parameters and code can be dynamically modified, allowing for robust verification of published results and encouraging rapid adoption of new methods. Conclusion Given the increasing complexity of bioinformatics workflows, we anticipate that these interactive software notebooks will become as necessary for documenting software methods as traditional laboratory notebooks have been for documenting bench protocols, and as ubiquitous.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Co-first authors
ISSN:1067-5027
1527-974X
1527-974X
DOI:10.1093/jamia/ocx120