HamHeat: A fast and simple package for calculating Hamming distance from multiple sequence data for heatmap visualization
The problem of fast calculation of Hamming distance inferred from many sequence datasets is still not a trivial task. Here, we present HamHeat, as a new software package to efficiently calculate Hamming distance for hundreds of aligned protein or DNA sequences of a large number of residues or nucleo...
Saved in:
Published in | bioRxiv |
---|---|
Main Authors | , , , |
Format | Paper |
Language | English |
Published |
Cold Spring Harbor
Cold Spring Harbor Laboratory Press
27.03.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The problem of fast calculation of Hamming distance inferred from many sequence datasets is still not a trivial task. Here, we present HamHeat, as a new software package to efficiently calculate Hamming distance for hundreds of aligned protein or DNA sequences of a large number of residues or nucleotides, respectively. HamHeat uses a unique algorithm with many advantages, including its ease of use and the execution of fast runs for large amounts of data. The package consists of three consecutive modules. In the first module, the software ranks the sequences from the most to the least frequent variant. The second module uses the most common variant as the reference sequence to calculate the Hamming distance of each additional sequence based on the number of residue or nucleotide changes. A final module formats all the results in a comprehensive table that displays the sequence ranks and Hamming distances. |
---|---|
DOI: | 10.1101/2020.03.26.009258 |