Persistent memory as an effective alternative to random access memory in metagenome assembly

The assembly of metagenomes decomposes members of complex microbe communities and allows the characterization of these genomes without laborious cultivation or single-cell metagenomics. Metagenome assembly is a process that is memory intensive and time consuming. Multi-terabyte sequences can become...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 23; no. 1; p. 513
Main Authors Sun, Jingchao, Qiu, Zhining, Egan, Rob, Ho, Harrison, Li, Yue, Wang, Zhong
Format Journal Article
LanguageEnglish
Published England BioMed Central Ltd 30.11.2022
BioMed Central
Springer Science + Business Media
BMC
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The assembly of metagenomes decomposes members of complex microbe communities and allows the characterization of these genomes without laborious cultivation or single-cell metagenomics. Metagenome assembly is a process that is memory intensive and time consuming. Multi-terabyte sequences can become too large to be assembled on a single computer node, and there is no reliable method to predict the memory requirement due to data-specific memory consumption pattern. Currently, out-of-memory (OOM) is one of the most prevalent factors that causes metagenome assembly failures. In this study, we explored the possibility of using Persistent Memory (PMem) as a less expensive substitute for dynamic random access memory (DRAM) to reduce OOM and increase the scalability of metagenome assemblers. We evaluated the execution time and memory usage of three popular metagenome assemblers (MetaSPAdes, MEGAHIT, and MetaHipMer2) in datasets up to one terabase. We found that PMem can enable metagenome assemblers on terabyte-sized datasets by partially or fully substituting DRAM. Depending on the configured DRAM/PMEM ratio, running metagenome assemblies with PMem can achieve a similar speed as DRAM, while in the worst case it showed a roughly two-fold slowdown. In addition, different assemblers displayed distinct memory/speed trade-offs in the same hardware/software environment. We demonstrated that PMem is capable of expanding the capacity of DRAM to allow larger metagenome assembly with a potential tradeoff in speed. Because PMem can be used directly without any application-specific code modification, these findings are likely to be generalized to other memory-intensive bioinformatics applications.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
AC02-05CH11231
USDOE Office of Science (SC), Biological and Environmental Research (BER)
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-022-05052-8