Multiplatform single-sample estimates of transcriptional activation

Over the past two decades, many biotechnology platforms have been developed for high-throughput gene expression profiling. However, because each platform is subject to technology-specific biases and produces distinct raw-data distributions, researchers have experienced difficulty in integrating data...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 110; no. 44; pp. 17778 - 17783
Main Authors	Piccolo, Stephen R., Withers, Michelle R., Francis, Owen E., Bild, Andrea H., Johnson, W. Evan
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 29.10.2013 NATIONAL ACADEMY OF SCIENCES National Acad Sciences
Subjects	Algorithms Bar codes Base Composition Bioinformatics Biological markers Biological Sciences Biotechnology computer software Data normalization DNA Barcoding, Taxonomic - methods Gene expression Gene Expression Profiling - methods Genes Genes - genetics Genomics high-throughput nucleotide sequencing Integration microarray technology Models, Genetic Physical Sciences Ribonucleic acid RNA Software Technology Tissue samples transcriptional activation Transcriptional Activation - physiology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Over the past two decades, many biotechnology platforms have been developed for high-throughput gene expression profiling. However, because each platform is subject to technology-specific biases and produces distinct raw-data distributions, researchers have experienced difficulty in integrating data across platforms. Data integration is crucial to data-generating consortiums, researchers transitioning to newer profiling technologies, and individuals seeking to aggregate data across experiments. We address this need with our Universal exPression Code (UPC) approach, which corrects for platform-specific background noise using models that account for the genomic base composition and length of target regions; this approach also uses a mixture model to estimate whether a gene is active in a particular profiling sample. The latter produces standardized UPC values on a zero-to-one scale, so that they can be interpreted consistently, irrespective of profiling technology, thus enabling downstream analysis pipelines to be developed in a platform-agnostic manner. The UPC method can be applied to one- and two-channel expression microarrays and to next-generation sequencing data (RNA sequencing). Furthermore, UPCs are derived using information from within a given sample only—no ancillary samples are required at processing time. Thus, UPCs are suitable for personalized-medicine workflows where samples must be processed individually rather than in batches. In a variety of analyses and comparisons, UPCs perform comparably to other methods designed specifically for microarrays or RNA sequencing in most settings. Software for calculating UPCs is freely available at www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html .
Bibliography:	http://dx.doi.org/10.1073/pnas.1305823110 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 Author contributions: S.R.P., A.H.B., and W.E.J. designed research; S.R.P. performed research; S.R.P., O.E.F., and W.E.J. contributed new analytic tools; S.R.P., M.R.W., and W.E.J. analyzed data; and S.R.P., A.H.B., and W.E.J. wrote the paper. Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved September 14, 2013 (received for review April 1, 2013)
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.1305823110