HamHeat: A fast and simple package for calculating Hamming distance from multiple sequence data for heatmap visualization

The problem of fast calculation of Hamming distance inferred from many sequence datasets is still not a trivial task. Here, we present HamHeat, as a new software package to efficiently calculate Hamming distance for hundreds of aligned protein or DNA sequences of a large number of residues or nucleo...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Rakov, Alexey V, Schifferli, Dieter M, Shu-Lin, Liu, Mastriani, Emilio
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 27.03.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The problem of fast calculation of Hamming distance inferred from many sequence datasets is still not a trivial task. Here, we present HamHeat, as a new software package to efficiently calculate Hamming distance for hundreds of aligned protein or DNA sequences of a large number of residues or nucleotides, respectively. HamHeat uses a unique algorithm with many advantages, including its ease of use and the execution of fast runs for large amounts of data. The package consists of three consecutive modules. In the first module, the software ranks the sequences from the most to the least frequent variant. The second module uses the most common variant as the reference sequence to calculate the Hamming distance of each additional sequence based on the number of residue or nucleotide changes. A final module formats all the results in a comprehensive table that displays the sequence ranks and Hamming distances.
DOI:10.1101/2020.03.26.009258