Coordinates and Intervals in Graph-based Reference Genomes

Motivation: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as positions of genes, on graph-based reference geno...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Knut Dagestad Rand, Grytten, Ivar, Nederbragt, Alexander, Storvik, Geir Olve, Glad, Ingrid Kristine, Sandve, Geir Kjetil
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 11.07.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motivation: It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as positions of genes, on graph-based reference genomes. Results: We formalize offset-based coordinate systems on graph-based reference genomes and introduce a method for representing intervals on these reference structures. We show the advantage of our method by representing genes on a graph-based representation of the GRCh38 version of the human genome and its alternative loci for regions that are highly variable. Conclusion: More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of GRCh38 and potential future graph-based reference genomes. We illustrate our notation for genomic intervals, as well as the offset-based coordinate systems, through a web tool at: https://github.com/uio-cels/gen-graph-coords.
DOI:10.1101/063206