Echtvar: compressed variant representation for rapid annotation and filtering of SNPs and indels
Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variant...
Saved in:
Published in | Nucleic acids research Vol. 51; no. 1; p. e3 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
England
Oxford University Press
11.01.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Germline and somatic variants within an individual or cohort are interpreted with information from large cohorts. Annotation with this information becomes a computational bottleneck as population sets grow to terabytes of data. Here, we introduce echtvar, which efficiently encodes population variants and annotation fields into a compressed archive that can be used for rapid variant annotation and filtering. Most variants, represented by chromosome, position and alleles are encoded into 32-bits-half the size of previous encoding schemes and at least 4 times smaller than a naive encoding. The annotations, stored separately within the same archive, are also encoded and compressed. We show that echtvar is faster and uses less space than existing tools and that it can effectively reduce the number of candidate variants. We give examples on germ-line and somatic variants to document how echtvar can facilitate exploratory data analysis on genetic variants. Echtvar is available at https://github.com/brentp/echtvar under an MIT license. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0305-1048 1362-4962 |
DOI: | 10.1093/nar/gkac931 |