Phylogenetic convolutional neural networks in metagenomics

Background Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data base...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 19; no. Suppl 2; pp. 49 - 13
Main Authors	Fioravanti, Diego, Giarratano, Ylenia, Maggio, Valerio, Agostinelli, Claudio, Chierici, Marco, Jurman, Giuseppe, Furlanello, Cesare
Format	Journal Article
Language	English
Published	London BioMed Central 08.03.2018 BioMed Central Ltd BMC
Subjects	Algorithms Analysis Bioinformatics Biomedical and Life Sciences Computational biology Computational Biology/Bioinformatics Computer Appl. in Life Sciences Convolutional neural networks Data Analysis Databases, Genetic Deep learning Genomics Humans Inflammatory Bowel Diseases - genetics Innovations Life Sciences Metagenomics Microarrays Neural Networks (Computer) Phylogenetic trees Phylogeny Principal Component Analysis Reproducibility of Results Support Vector Machine Deep learning Metagenomics Phylogenetic trees Convolutional neural networks
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-018-2033-5

Cover

More Information
Summary:	Background Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Results Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Conclusion Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-018-2033-5