I'm trying to figure out why Emboss Needle and Biopython pairwise2.align.global give different results.
I start with two sequences that are difficult to align (generated randomly)
seq 1: CATTTTTTGAACAATCAGATCAGAAGCACCTTCGGGATAATCCTCACGGTTCTTGGGCATACGCCTCATGGTAAC
seq 2: ATGGAGCAGTGGGTTGTTACAGATCTCATAATCAGGGAACAGGTCTGGATAATGCCCCATCTCCACGATATCCCG
Emboss needle gives the following alignment, with parameters gapopen=10, gapextend=0.5, datafile=EDNAFULL
------------------AATAGTGTACCTGGCC----GGA-TCTTTATTCCTTCCAGGACAACT-----------CA---CATCTGCCCCACTACCCTAAACAAAATTTAA
CATTTTTGGAGTCACGTCAACAG---ACC--GCCCGGGGGATTCGTGACTAATCGCAGGA----TGGAGCGTGGAGCAGGCCAT----------------------------
However, I cannot get pairwise2.align.global to recreate this alignment. I have tried using globalds with the same parameters, and a dictionary version of EDNAFULL. I've also played with the penalize_extend_when_opening and penalize_end_gaps with no success. Here is a sample alignment from the pairwise2 output and both of the flags previously mentioned set to false.
------CA-----TTTTTTGAACA-ATCAGATCAGAAGCACCTTCGGGATAATCCTCACGGTTCTTGGGCATAC-GCCTCAT------GGTAAC---
ATGGAGCAGTGGGTTGTT---ACAGATC---TCATAATCA-----GGGA--------ACAGGTCT---GGATAATGCCCCATCTCCACGATATCCCG
Both of these programs are just meant to implement the Needleman-Wunsch algorithm, so why are they different and which one should I trust?
I'm also getting different results.
Biopython:
Alignment:
Emboss needle:
Alignment:
Thanks for looking into this!
Unfortunately, while those parameters do happen to work for those sequences, its trivial to find sequences that don't match given those parameters.
For example
Produces different alignments again.
Btw, I'm just generating the strings with
Wooo, you're right. I'm moving my answer as comment. I think the difference in alignments and scores may come from the fact that needle has two additional parameters (not present in biopython):
endopen
andendextend
.I explored that a little too, but by default the
-endweight
parameter is set to False, which meansendopen
/endextend
shouldn't make any difference. Also as I said in the post, I tried playing with thepenalize_extend_when_opening
andpenalize_end_gaps
parameters in biopython but still couldn't get the alignments the same.Hello, as I am a green hand in programming, I wonder how to get the result like :
Length: 118
Identity: 22/118 (18.6%)
Similarity: 22/118 (18.6%)
Gaps: 86/118 (72.9%)
Score: 47.5
Is it get from the python on the window system or it is the result from linux system(with shell or something like that?) Thank you !
Please just ask a new question instead of adding a tangent to an existing one. Please also note, if you do open a new thread we require much more information, and you should show some effort you’ve made to solve the problem.
Tagging Dr. Peter Cock: Peter
What version of Biopython are you on?
The pairwise module was re-written in (I think) 1.68, but it may be later. You'll get different results with the rewritten version to the old version. Peter would know for sure.
I am on 1.71 master.
Have you also tried the variations on
global
? e.g.globalxx
,globalms
,globalmx
etc etc.I've never really delved in to/understood the differences, but I know there are multiple options for local and global
Some of those alternatives (I think the ones ending in -x) allow you to define custom scoring functions.
I'm using
globalds
with the EDNAFULL matrix converted into a dictionary for thematch_dict
parameter. This should be the most accurate way to reproduce Emboss Needle I think.Below is my full code to translate the EDNAFULL matrix
I'm using BioPython 1.70.