Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data
Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and sample-specific isoforms. Furthermore, there is opportunity to call variants encoded in the transcribed regions of genes directly from lrRNA-seq data. However, most state-of-the-art...
Saved in:
Published in | bioRxiv |
---|---|
Main Authors | , , , , , , |
Format | Paper |
Language | English |
Published |
Cold Spring Harbor
Cold Spring Harbor Laboratory Press
02.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and sample-specific isoforms. Furthermore, there is opportunity to call variants encoded in the transcribed regions of genes directly from lrRNA-seq data. However, most state-of-the-art variant callers have been developed for genomic DNA and thus require modifications to call variants from lrRNA-seq data. Here, we benchmark variant callers GATK, DeepVariant, Clair3, and NanoCaller primarily on PacBio lrRNA-seq, or "Iso-Seq", data, but also on Nanopore and Illumina RNA-seq data. In particular, we found that careful processing of alignment files is critical to achieve better calling performance of indels and SNPs using DeepVariant and indels using Clair3.Competing Interest StatementET is an employee of Pacific Biosciences. All remaining authors declare that they have no competing interests.Footnotes* Various textual clarifications; added benchmark data for Oxford Nanopore and Illumina RNA-seq datasets.* https://github.com/vladimirsouza/lrRNAseqVariantCalling* https://github.com/vladimirsouza/lrRNAseqBenchmark |
---|---|
DOI: | 10.1101/2022.02.08.479579 |