Greetings, I have some protein sequences in which I want to align, and highlight the mismatches using python. Here's what I have so far (after creating a list of the AA seqs):
new = []
for seq1 in allkeeps_sorted:
if len(seq1) != len(VJ):
xlen = (len(VJ) - len(seq1))*"X"
print xlen
newseq = seq1 + str(xlen)
new.append(newseq)
else:
new.append(seq1)
for seq1 in new:
for x in range(0, len(VJ)):
if (seq1[x] != VJ[x]):
match = colored(seq1[x], "red")
else:
match = colored(seq1[x], "white")
align.append(match)
print align
In the list "seqs" some of the AA sequences are shorter than the "VJ" sequence. So I attempted to add "X"'s to the end of the sequence to make the strings within the list equal, in order to highlight mismatches. However, this did not work. I'm doing this in a Linux terminal, so I want the mismatched AA's to the VJ sequence to be colored red.
All help is appreciated.
I should probably stop asking this question: Why reinvent the wheel? Why not use existing MSA software (like Clustal) that can be customized in a multitude of ways?
Also, I'd ideally check if
len(seq1)<len(VJ)
and not use a!=
. Just erring on the side of caution.