Local alignment of different lengths in Biopython
1
1
Entering edit mode
5.1 years ago
ognjen011 ▴ 290

I am trying to find the optimal local alignment score of a short sequence (50bp) to a reference bacterial genome of several Mbp. When I use Biopython Align module, I always get the score equal to the length of the shorter (query) sequence. For example, with alignxx function:

('--G------------------A-----------TC-G-G-G------AC-G---C--C----G--G-------T---TG-C-GA----AT---A--CA-C----AG-G--------T----------TT---AT-G----TG-C-T-----G------G--------AC--G--A-A----A-A----A----C-A----------------A---T--C-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------', 'CGGTGCCCTCCTCCGCGGCGTACACCCCGCGCGTCCGCGCGGGCGCGACTGATACAGCTTACGCCGGCGCGACTGCCTGTCCGAGTCGATCGCATACATCTGTTAGCGCGGCAGGGTGAACGCGGCGTTCGCATCGCCGATGGCGTACTACGCCTCCCGGGGGCCGCACACGTGACATCGGACAGGCGAGGTTCCACCGGGCCCCGCTCCGGAGCCTGGCGCGAGTGAATGCCATACCTTATATAAACTCAGTCCGCCCAGAACTATGGGTATGGAACCCGGCGGTATAGGGCACTCGCCCACCCATTGTCCTGAACTGCTGCTGAACATTGCGCGATTATGGTTGATACCGCGGATACCTCAGCGCGCTCTGCGACGCGCCCTGAGGACCCGACCGCCACTTAAGCATAGGCCGCCAGAAGAGCGCCGCTCCTAGGTGCGGCCGCGGAGACTAGGGCGGCCTTGAGACCGGAAATGCAAGGACCTGGGCAACCAACTTACCCCTTCGCTAGTGCTGAGCTACGGCGGATAAGAACGTCAATAGGCGAATAGAGTCTCCGCACCCATCGCCCTCTGAGGTTTGGAGGGCGGCCAAAGATACTGGGCAGGACCCCCGCATCCCAAGGTAGTCTGGTTCGGGACCAGCCTATCGAACAGTTCGAAATACGGAGCGGGACAAACGAAGCATCGCGCGTACCGTGTCCCGGACGCGGTGCGGACCTCGCCCGCACTTGCTTCCCGTGAACACGGAGCGGTCCCGCCCGTAGGGATCATATCGTCCCAAGGGACAGCCATAGCCCGCCTCCTCCTATGGTGCTGAGAAGCCCGGGGGCCCCCAGCAGCCTCTCCGCGGACCGAGGATGAGAGGAACACGCTCTACCGAGGCACAGTACCCTGGTTTTGTTCCAGCAATCCCCTTCCACGGCGCTATCTTGCTTAGGTGCGGTGAGAATAATGTCCGACTCCCGTTCCCGGGGATTTGGCGGTGCGGCCCAGTAT', 54.0, 0, 999)

The local alignment seems to give correct values (similar to editor distance) if the two sequences are comparable in size.

  1. Why are the gaps in one of the sequence not counted? Can I set a parameter to alleviate that?
  2. Is the editor distance a good approximation to the alignment score (taking into account it doesn't handle reverse complements)?

Thanks!

alignment • 1.1k views
ADD COMMENT
2
Entering edit mode
5.1 years ago
Joe 21k

You need to show us the alignment code you used.

If its not counting gaps. thats probably because you used one of the AlignIO methods which doesn't count them. You will need to be explicit about the alignment and scoring methods.

I would say no, in general, edit distance is not a good proxy for alignment score. Unless your sequences are identical in length and include no INDELs.

ADD COMMENT
1
Entering edit mode

You are correct, I used alignxx which doesn't score INDELs. Please answer the Q so I can accept, and I added that to edit.

ADD REPLY

Login before adding your answer.

Traffic: 2213 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6