Deep distributed computing to reconstruct extremely large lineage trees

Phylogeny estimation (the reconstruction of evolutionary trees) has recently been applied to CRISPR-based cell lineage tracing, allowing the developmental history of an individual tissue or organism to be inferred from a large number of mutated sequences in somatic cells. However, current computatio...

Full description

Saved in:
Bibliographic Details
Published inNature biotechnology Vol. 40; no. 4; pp. 566 - 575
Main Authors Konno, Naoki, Kijima, Yusuke, Watano, Keito, Ishiguro, Soh, Ono, Keiichiro, Tanaka, Mamoru, Mori, Hideto, Masuyama, Nanami, Pratt, Dexter, Ideker, Trey, Iwasaki, Wataru, Yachie, Nozomu
Format Journal Article
LanguageEnglish
Published New York Nature Publishing Group US 01.04.2022
Nature Publishing Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Phylogeny estimation (the reconstruction of evolutionary trees) has recently been applied to CRISPR-based cell lineage tracing, allowing the developmental history of an individual tissue or organism to be inferred from a large number of mutated sequences in somatic cells. However, current computational methods are not able to construct phylogenetic trees from extremely large numbers of input sequences. Here, we present a deep distributed computing framework to comprehensively trace accurate large lineages (FRACTAL) that substantially enhances the scalability of current lineage estimation software tools. FRACTAL first reconstructs only an upstream lineage of the input sequences and recursively iterates the same produce for its downstream lineages using independent computing nodes. We demonstrate the utility of FRACTAL by reconstructing lineages from >235 million simulated sequences and from >16 million cells from a simulated experiment with a CRISPR system that accumulates mutations during cell proliferation. We also successfully applied FRACTAL to evolutionary tree reconstructions and to an experiment using error-prone PCR (EP-PCR) for large-scale sequence diversification. Cell lineage tracing is scaled up to hundreds of millions of simulated sequences with distributed computing.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Author contributions
N.K., H.M. and N.Y. conceived the high-level concept of FRACTAL. N.K., W.I. and N.Y. designed the study. N.K. implemented FRACTAL. K.W. and N.K. implemented PRESUME. N.K. led the analyses. N.M. and N.Y. designed the high-content cell lineage recording model. Y.K. and K.W. supported the analyses of the simulated cell lineages. S.I. and M.T. performed the EP-PCR experiments. Y.K. supported the analysis of the EP-PCR experiments. N.K., K.O., D.P. and T.I. performed data visualization using HiView. N.K., Y.K., S.I. and N.Y. wrote the manuscript.
ISSN:1087-0156
1546-1696
DOI:10.1038/s41587-021-01111-2