A 28nm Fully Integrated End-to-End Genome Analysis Accelerator for Next-Generation Sequencing

This paper presents the first end-to-end next-generation sequencing (NGS) data analysis accelerator for short-read mapping, haplotype calling, variant calling, and genotyping. It supports both single-end and paired-end short-reads (or reads) and uses the FM-index, a compact index data structure, for...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on biomedical circuits and systems Vol. PP; pp. 1 - 15
Main Authors	Wu, Yi-Chung, Chen, Yen-Lung, Yang, Chung-Hsuan, Lee, Chao-Hsi, Chen, Wen-Ching, Lin, Liang-Yi, Chang, Nian-Shyang, Lin, Chun-Pin, Chen, Chi-Shi, Hung, Jui-Hung, Yang, Chia-Hsiang
Format	Journal Article
Language	English
Published	United States IEEE 27.03.2025
Subjects	application-specific integrated circuit (ASIC) Bioinformatics Data analysis Data mining digital CMOS integrated circuits Engines Genomics Genotypes genotyping haplotype calling Indexes Next-generation sequencing (NGS) Sensitivity Sequential analysis short-read mapping Training variant calling
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper presents the first end-to-end next-generation sequencing (NGS) data analysis accelerator for short-read mapping, haplotype calling, variant calling, and genotyping. It supports both single-end and paired-end short-reads (or reads) and uses the FM-index, a compact index data structure, for exact-match in short-read mapping. For inexact match part of short-read mapping, a dynamic programming array is proposed to determine the mapping results. To reduce the workload of short-read mapping, a rapid similarity calculation is designed. A rescue technique is also adopted to increase the overall sensitivity. In haplotype calling, a parallel k -mer processing engine can construct the de Bruijn graph and assemble the haplotypes. The variant calling step determines variants between a subject and a reference genome sequence with a variant discovery engine. Lastly, genotype likelihood is computed in parallel by a genotype likelihood computing engine, which outputs genotypes of all discovered variants and corresponding Phred-scaled likelihood (PL) values. This work completes end-to-end data analysis for the 50× PrecisionFDA dataset in an average of 28.2 minutes. It achieves a 3-to-59× higher throughput than the existing solutions with higher precision (99.79%) and sensitivity (99.03%). The chip also achieves a 935× higher energy efficiency than the Illumina DRAGEN FPGA acceleration system.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1932-4545 1940-9990 1940-9990
DOI:	10.1109/TBCAS.2025.3555579