I took a look at the pairwise2 module but also couldn't find any easy way to obtain the indices of the matches ("|"). I was able to modify the format_alignment method slightly to print a list of indices (that starts at 0) of the pipe operator for this specific example - but I'm not sure how it will perform under different testing conditions. Where this may fail is if the "begin" variable != 0 - you'll have to keep an eye out for that.
from Bio import pairwise2
from Bio.pairwise2 import format_alignment
## This is a modification of the format_alignment method
def match_index(align1, align2, score, begin, end):
"""Format the alignment prettily into a string.
Since Biopython 1.71 identical matches are shown with a pipe
character, mismatches as a dot, and gaps as a space.
Note that spaces are also used at the start/end of a local
alignment.
Prior releases just used the pipe character to indicate
aligned region (matches, mismatches and gaps).
"""
s = []
s.append("%s\n" % align1)
s.append(" " * begin)
for a, b in zip(align1[begin:end], align2[begin:end]):
if a == b:
s.append("|") # match
elif a == "-" or b == "-":
s.append(" ") # gap
else:
s.append(".") # mismatch
s.append("\n")
s.append("%s\n" % align2)
s.append(" Score=%g\n" % score)
## Obtain indices of matching characters (indicated by the "|" character)
c = []
for pos, char in enumerate(s):
pipe = "|"
if char == pipe:
c.append(pos-2)
return(c)
return ''.join(s)
alignments = pairwise2.align.globalxx(' EEEEE HHH HHH EEEEE', 'EEE EEEE HHH')
print(format_alignment(*alignments[0]))
print(match_index(*alignments[0]))
Results:
EEEEE HHH HHH ---- EEEEE---
||| | |||||| |||||||
---EEE------------- --- EEEE -----HHH
Score=17
[3, 4, 5, 19, 23, 24, 25, 26, 27, 28, 33, 34, 35, 36, 37, 38, 39]
The other caveat is that I was unable to exactly replicate your alignment without the code that you used, so the alignment depicted here is different than the one in your example. Hope this helps.