I would to like to know if there any python module to calculate a Hamming distance from multiple sequences aligment
1
0
Entering edit mode
5.1 years ago
schlogl ▴ 160

Any of you guys have some suggestion about to calculate the HD distance and the entropy of a multiple sequence alignment?

Thanks

alignment • 4.5k views
ADD COMMENT
1
Entering edit mode

It would appear that there are plenty of them.

ADD REPLY
0
Entering edit mode

I don't vouch for the contents of this repository, but its description at least matches your task: Hamming Distance Comparison of Amino Acid Sequences of 10 Organisms

ADD REPLY
3
Entering edit mode
5.1 years ago
Joe 21k

BioPython can do all of this, but it’s pretty easy to implement yourself (and is good practise).

See for example: https://github.com/jrjhealey/bioinfo-tools/blob/master/Shannon.py

And Hamming distance is super simple: https://github.com/jrjhealey/bioinfo-tools/blob/master/StringComparisons.py#L84

ADD COMMENT
0
Entering edit mode

Hi Joe, I have a nice function for HD, but my doubt was about how to get all sequences checked for HD and the entropy. Once HD counts for two sequences at time. And in a MSA you have a lot of them compared. Maybe a loop checking each two sequence. I will check it out. thanks

ADD REPLY
1
Entering edit mode

You can do all pairwise comparisons between sequences and store the numbers.

Check out the itertools module.

ADD REPLY
0
Entering edit mode

maybe

itertools.imap(function, *iterables) ?

ADD REPLY
0
Entering edit mode

I got this Joe:

def hamming_dist(s1, s2):
    assert len(s1) == len(s2)
    hd = 0
    for b1, b2 in zip(s1, s2):
        if b1 != b2:
            hd += 1
    return hd

def imap(function, *iterables):
    iterables = map(iter, iterables)
    for it in iterables:
        args = tuple(it)
        if function is None:
            yield tuple(args)
        else:
            yield function(*args) 

distances = imap(hamming_dist, *itertools.combinations(ls,2))
for dist in distances:
    print(dist)

Do you think there are some way toimprove it?

PS- Just test in a toy exaple.

ADD REPLY
1
Entering edit mode

I think you can do this more simply if you want to use BioPython.

I forget the exact syntax now but it would be something like:

from Bio import AlignIO
import itertools
aln = AlignIO.read(...)
for r1, r2 in itertools.combinations(aln, 2):
   print("\n".join([r1.id, str(r1.seq), r2.id, str(r2.seq), str(hamming_distance(str(r1.seq), str(r2.seq)))]))

(Not the prettiest output, but you can tweak).

Your solution looks reasonable too though, so whatever works.

ADD REPLY
1
Entering edit mode

Hey Joe your solution was nicer because included the id. I will try to do that in mine. 8)

DQ676872
GCAGGAAGCACTATGGGCGCGGCGTCAATAACGCTGACGGTACAGGCCAGG---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATATGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTCTAGAAAGATACCTAAAGGATCAACAGCTC
AB253421
GCAGGAAGCACTATGGGCGCGGCGTCAATGACGCTGACGGTACAGGCCAGA---------CAATTATTGTCTGGCATAGTGCAACAGCAAAGCAATTTGCTGAGGGCTATAGAGGCTCAACAACATCTGTTGAAACTCACGGTCTGGGGCATTAAACAGCTCCAGGCAAGAGTCCTGGCTGTGGAAAGCTACCTAAGGGATCAACAGCTC
7

Thank you for your support! Paulo

ADD REPLY
0
Entering edit mode

I will check yours too Joe. Thanks 8)

ADD REPLY

Login before adding your answer.

Traffic: 2308 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6