using python for pairwise alignment
2
5
Entering edit mode
10.5 years ago

Hi

I'm writing a Python program and I have to do a pairwise alignment on several thousand DNA sequences. I looked at Biopython but I couldn't fine a function to do a pairwise alignment, this may be my mistake. The function should have gap penalty, gap open, gap extension and Smith Waterman or Needleman Wunsch. Can anyone tell me where I can find a good Python package for doing this kind of alignment?

thanks in advance

pairwise-alignment python • 22k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
4
Entering edit mode
10.5 years ago
João Rodrigues ★ 2.5k

Hi,

BioPython does exactly what you are asking, you probably looked in the wrong place :) You should look at the Bio.pairwise2 module. See the following example for global pairwise alignment:

from Bio import pairwise2
from Bio.SubsMat import MatrixInfo as matlist

matrix = matlist.blosum62

gap_open = -10
gap_extend = -0.5

# Only using the first 60 aas for visualization reasons..

p53_human = "MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP"
p53_mouse = "MEESQSDISLELPLSQETFSGLWKLLPPEDILPSPHCMDDLLLPQDVEEFFEGPSEALRV"

alns = pairwise2.align.globalds(p53_human, p53_mouse, matrix, gap_open, gap_extend)
top_aln = alns[0]
aln_human, aln_mouse, score, begin, end = top_aln
print aln_human+'\n'+aln_mouse

MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP----
MEESQSDISLELPLSQETFSGLWKLLPPEDIL-PSP-HCMDDLLL-PQDVEEFF-EGPSEALRV

# Comparing with EBI NEEDLE output
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP------
MEESQSDISLELPLSQETFSGLWKLLPPEDIL-PSP-HCMDDLLL-PQDVEEFF---EGPSEALRV
ADD COMMENT
0
Entering edit mode
10.5 years ago

Hi jellevandewege,

You can use the Bio.Emboss python package for Needleman Wunsch alignment (It is essentially a wrapper around the program).

Here is a short example copied from my code base:

from Bio.Emboss.Applications import NeedleCommandline
from Bio import AlignIO

needle_cli = NeedleCommandline(asequence=seq_fname1, \
                               bsequence=seq_fname2, \
                               gapopen=10, \
                               gapextend=0.5, \
                               outfile=needle_fname)

"""This generates the needle file"""
needle_cli() 
"""That parses the needle file, aln[0] and aln[1] contain the aligned
first and second sequence in the usual format (e.g. - for a gap)"""
aln = AlignIO.read(needle_file, "emboss")

Yours,
Mahmoud

ADD COMMENT
0
Entering edit mode

is it possible to perform pairwise sequence alignment sequence 1:target sequence sequence 2: sequence extract from NCBI PDB based on PDB id like 1mkp

ADD REPLY

Login before adding your answer.

Traffic: 1515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6