I am trying to find the optimal local alignment score of a short sequence (50bp) to a reference bacterial genome of several Mbp. When I use Biopython Align module, I always get the score equal to the length of the shorter (query) sequence. For example, with alignxx function:
('--G------------------A-----------TC-G-G-G------AC-G---C--C----G--G-------T---TG-C-GA----AT---A--CA-C----AG-G--------T----------TT---AT-G----TG-C-T-----G------G--------AC--G--A-A----A-A----A----C-A----------------A---T--C-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------', 'CGGTGCCCTCCTCCGCGGCGTACACCCCGCGCGTCCGCGCGGGCGCGACTGATACAGCTTACGCCGGCGCGACTGCCTGTCCGAGTCGATCGCATACATCTGTTAGCGCGGCAGGGTGAACGCGGCGTTCGCATCGCCGATGGCGTACTACGCCTCCCGGGGGCCGCACACGTGACATCGGACAGGCGAGGTTCCACCGGGCCCCGCTCCGGAGCCTGGCGCGAGTGAATGCCATACCTTATATAAACTCAGTCCGCCCAGAACTATGGGTATGGAACCCGGCGGTATAGGGCACTCGCCCACCCATTGTCCTGAACTGCTGCTGAACATTGCGCGATTATGGTTGATACCGCGGATACCTCAGCGCGCTCTGCGACGCGCCCTGAGGACCCGACCGCCACTTAAGCATAGGCCGCCAGAAGAGCGCCGCTCCTAGGTGCGGCCGCGGAGACTAGGGCGGCCTTGAGACCGGAAATGCAAGGACCTGGGCAACCAACTTACCCCTTCGCTAGTGCTGAGCTACGGCGGATAAGAACGTCAATAGGCGAATAGAGTCTCCGCACCCATCGCCCTCTGAGGTTTGGAGGGCGGCCAAAGATACTGGGCAGGACCCCCGCATCCCAAGGTAGTCTGGTTCGGGACCAGCCTATCGAACAGTTCGAAATACGGAGCGGGACAAACGAAGCATCGCGCGTACCGTGTCCCGGACGCGGTGCGGACCTCGCCCGCACTTGCTTCCCGTGAACACGGAGCGGTCCCGCCCGTAGGGATCATATCGTCCCAAGGGACAGCCATAGCCCGCCTCCTCCTATGGTGCTGAGAAGCCCGGGGGCCCCCAGCAGCCTCTCCGCGGACCGAGGATGAGAGGAACACGCTCTACCGAGGCACAGTACCCTGGTTTTGTTCCAGCAATCCCCTTCCACGGCGCTATCTTGCTTAGGTGCGGTGAGAATAATGTCCGACTCCCGTTCCCGGGGATTTGGCGGTGCGGCCCAGTAT', 54.0, 0, 999)
The local alignment seems to give correct values (similar to editor distance) if the two sequences are comparable in size.
- Why are the gaps in one of the sequence not counted? Can I set a parameter to alleviate that?
- Is the editor distance a good approximation to the alignment score (taking into account it doesn't handle reverse complements)?
Thanks!
You are correct, I used alignxx which doesn't score INDELs. Please answer the Q so I can accept, and I added that to edit.