HamHeat: A fast and simple package for calculating Hamming distance from multiple sequence data for heatmap visualization

The problem of fast calculation of Hamming distance inferred from many sequence datasets is still not a trivial task. Here, we present HamHeat, as a new software package to efficiently calculate Hamming distance for hundreds of aligned protein or DNA sequences of a large number of residues or nucleo...

Full description

Saved in:

Bibliographic Details
Published in	bioRxiv
Main Authors	Rakov, Alexey V, Schifferli, Dieter M, Shu-Lin, Liu, Mastriani, Emilio
Format	Paper
Language	English
Published	Cold Spring Harbor Cold Spring Harbor Laboratory Press 27.03.2020
Subjects	Computer programs Metric system Nucleotide sequence Software
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The problem of fast calculation of Hamming distance inferred from many sequence datasets is still not a trivial task. Here, we present HamHeat, as a new software package to efficiently calculate Hamming distance for hundreds of aligned protein or DNA sequences of a large number of residues or nucleotides, respectively. HamHeat uses a unique algorithm with many advantages, including its ease of use and the execution of fast runs for large amounts of data. The package consists of three consecutive modules. In the first module, the software ranks the sequences from the most to the least frequent variant. The second module uses the most common variant as the reference sequence to calculate the Hamming distance of each additional sequence based on the number of residue or nucleotide changes. A final module formats all the results in a comprehensive table that displays the sequence ranks and Hamming distances.
DOI:	10.1101/2020.03.26.009258