SparkINFERNO: A scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants

We report SparkINFERNO (Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants), a scalable bioinformatics pipeline characterizing noncoding GWAS association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulato...

Full description

Saved in:

Bibliographic Details
Published in	bioRxiv
Main Authors	Kuksa, Pavel P, Chien-Yueh, Lee, Amlie-Wolf, Alexandre, Prabhakaran Gangadharan, Mlynarski, Elizabeth E, Yi-Fan, Chou, Han-Jen, Lin, Issen, Heather, Greenfest-Allen, Emily, Valladares, Otto, Leung, Yuk Yee, Li-San, Wang
Format	Paper
Language	English
Published	Cold Spring Harbor Cold Spring Harbor Laboratory Press 08.01.2020
Subjects	Bioinformatics Computer applications Genetic diversity Genomics Molecular modelling Quantitative trait loci Regulatory sequences Statistical analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We report SparkINFERNO (Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants), a scalable bioinformatics pipeline characterizing noncoding GWAS association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts, and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci, and other functional datasets across ore than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWAS studies and show that SparkINFERNO is more than 60-times efficient and scales with data size and amount of computational resources. Availability: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno.
DOI:	10.1101/2020.01.07.897579