seq_a = 'GAGAGATTTTCCAATTCGACG-------CGGGGTCAGG--GAAATTT'
seq_b = 'GAGAGATTGGCCTTAACTACCCAACCCACGGCCTGACCGAGGTCTTC'
G
,A
,C
,T
= Bases
-
= INDEL
PYTHON - I am very new to programming and need some help, I would like to write a python program that will first find indels '-' in seq_a and then compare both sequences (seq_a and seq_b) downstream and upstream from the the indels counting the number of differences between the bases.
e.g.
seq_a - GG--GAAA
seq_b - CCGAGGTC
This example has 5 SNPS upstream and downstream from the the indel c-g, c-g, a-g, a-t, a-c
I was wondering if anyone could give me any pointers or ideas how I would start of this program?
Thanks :)
Note that the easiest solution is to use biopython. It has some built-in facilities to perform alignment (e.g. the pairwise2 module) and can also just use command line alignment tools that tend to be faster.