Any of you guys have some suggestion about to calculate the HD distance and the entropy of a multiple sequence alignment?
Thanks
Any of you guys have some suggestion about to calculate the HD distance and the entropy of a multiple sequence alignment?
Thanks
BioPython can do all of this, but it’s pretty easy to implement yourself (and is good practise).
See for example: https://github.com/jrjhealey/bioinfo-tools/blob/master/Shannon.py
And Hamming distance is super simple: https://github.com/jrjhealey/bioinfo-tools/blob/master/StringComparisons.py#L84
I got this Joe:
def hamming_dist(s1, s2):
assert len(s1) == len(s2)
hd = 0
for b1, b2 in zip(s1, s2):
if b1 != b2:
hd += 1
return hd
def imap(function, *iterables):
iterables = map(iter, iterables)
for it in iterables:
args = tuple(it)
if function is None:
yield tuple(args)
else:
yield function(*args)
distances = imap(hamming_dist, *itertools.combinations(ls,2))
for dist in distances:
print(dist)
Do you think there are some way toimprove it?
PS- Just test in a toy exaple.
I think you can do this more simply if you want to use BioPython.
I forget the exact syntax now but it would be something like:
from Bio import AlignIO
import itertools
aln = AlignIO.read(...)
for r1, r2 in itertools.combinations(aln, 2):
print("\n".join([r1.id, str(r1.seq), r2.id, str(r2.seq), str(hamming_distance(str(r1.seq), str(r2.seq)))]))
(Not the prettiest output, but you can tweak).
Your solution looks reasonable too though, so whatever works.
Hey Joe your solution was nicer because included the id. I will try to do that in mine. 8)
DQ676872
GCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGG---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATATGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTCTAGAAAGATACCTAAAGGATCAACAGCTC
AB253421
GCAGGAAGCACTATGGGCGCGGCGTCAATGACGCTGACGGTACAGGCCAGA---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATCTGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGCTACCTAAGGGATCAACAGCTC
7
Thank you for your support! Paulo
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It would appear that there are plenty of them.
I don't vouch for the contents of this repository, but its description at least matches your task:
Hamming Distance Comparison of Amino Acid Sequences of 10 Organisms