Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data

Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and sample-specific isoforms. Furthermore, there is opportunity to call variants encoded in the transcribed regions of genes directly from lrRNA-seq data. However, most state-of-the-art...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Vladimir B C De Souza, Jordan, Ben T, Tseng, Elizabeth, Nelson, Elizabeth A, Hirschi, Karen K, Sheynkman, Gloria M, Robinson, Mark D
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 02.02.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and sample-specific isoforms. Furthermore, there is opportunity to call variants encoded in the transcribed regions of genes directly from lrRNA-seq data. However, most state-of-the-art variant callers have been developed for genomic DNA and thus require modifications to call variants from lrRNA-seq data. Here, we benchmark variant callers GATK, DeepVariant, Clair3, and NanoCaller primarily on PacBio lrRNA-seq, or "Iso-Seq", data, but also on Nanopore and Illumina RNA-seq data. In particular, we found that careful processing of alignment files is critical to achieve better calling performance of indels and SNPs using DeepVariant and indels using Clair3.Competing Interest StatementET is an employee of Pacific Biosciences. All remaining authors declare that they have no competing interests.Footnotes* Various textual clarifications; added benchmark data for Oxford Nanopore and Illumina RNA-seq datasets.* https://github.com/vladimirsouza/lrRNAseqVariantCalling* https://github.com/vladimirsouza/lrRNAseqBenchmark
DOI:10.1101/2022.02.08.479579