You need to iterate through both aligned sequences (given in alignments[0][0]
and alignments[0][1]
) and compare both letters.
E.g. like this:
from Bio import pairwise2 as pw
alignments = pw.align.globalxx('ACCGT', 'ACG')
consensus = []
for a, b in zip(alignments[0][0],alignments[0][1]):
if a == b:
consensus.append(a)
elif a == '-':
consensus.append(b)
elif b == '-':
consensus.append(a)
elif a in ('G', 'C') and b in ('G', 'C'):
consensus.append('S') # for strong
# etc, etc
print('seq1 ' + alignments[0][0])
print('seq2 ' + alignments[0][1])
print('cons ' + ''.join(consensus))
Output:
seq1 ACCGT
seq2 A-CG-
cons ACCGT
Instead of running through numerous elif
it may be more elegant to use a dictionary and construct keys from your two letters:
from Bio import pairwise2 as pw
alignments = pw.align.globalxx('ACCGT', 'ACG')
consensus = []
cons_dic = {'CG': 'S', 'AT': 'W', 'AC': 'M', 'AG': 'R'} # etc, etc
for a, b in zip(alignments[0][0],alignments[0][1]):
if a == b:
consensus.append(a)
elif a == '-':
consensus.append(b)
elif b == '-':
consensus.append(a)
else:
key = ''.join(sorted(a + b))
consensus.append(cons_dic[key])
print('seq1 ' + alignments[0][0])
print('seq2 ' + alignments[0][1])
print('cons ' + ''.join(consensus))
Based on your short example, what should the consensus look like?
It could be a lot of different things like
ACCGT
orANCGN
. I just don't know the best place to turn to now start deriving a consensus sequence.How can you define a consensus from just 2 sequences? With only 2 sequences you don't know which one is 'right' , so can't possibly pick a consensus base.
Do you just mean you want to know the positions where the base is ambiguous?