Python - finding INDELS and SNPS
1
0
Entering edit mode
10.0 years ago
seq_a = 'GAGAGATTTTCCAATTCGACG-------CGGGGTCAGG--GAAATTT'
seq_b = 'GAGAGATTGGCCTTAACTACCCAACCCACGGCCTGACCGAGGTCTTC'

G,A,C,T = Bases - = INDEL

PYTHON - I am very new to programming and need some help, I would like to write a python program that will first find indels '-' in seq_a and then compare both sequences (seq_a and seq_b) downstream and upstream from the the indels counting the number of differences between the bases.

e.g.

seq_a - GG--GAAA
seq_b - CCGAGGTC

This example has 5 SNPS upstream and downstream from the the indel c-g, c-g, a-g, a-t, a-c

I was wondering if anyone could give me any pointers or ideas how I would start of this program?

Thanks :)

SNP INDEL Python • 9.4k views
ADD COMMENT
0
Entering edit mode

Note that the easiest solution is to use biopython. It has some built-in facilities to perform alignment (e.g. the pairwise2 module) and can also just use command line alignment tools that tend to be faster.

ADD REPLY
2
Entering edit mode
10.0 years ago
Whetting ★ 1.6k

Assuming that you have your alignment in a file named "fasta.fas" this should get you started

from Bio import AlignIO

y=0
alignment = AlignIO.read("fasta.fas", "fasta")
for r in range(0,len(alignment[1].seq)):
    if alignment[0,r] != alignment[1,r]:
        if alignment[0,r] != "-" and alignment[1,r] != "-":
            y=y+1
            print r, alignment[0,r], alignment[1,r], y
        else:
            y=0

This returns position of SNP, nt in seq_A, nt in seq_B, running tally of the number of SNPs upstream of each indel

ADD COMMENT
0
Entering edit mode

cited in https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0208511

First complete chloroplast genomics and comparative phylogenetic analysis of Commiphora gileadensis and C. foliacea: Myrrh producing trees

ADD REPLY
0
Entering edit mode

cited in

Complete chloroplast genomes of medicinally important Teucrium species and comparative analyses with related species from Lamiaceae https://media.proquest.com/media/hms/PFT/1/PIi9A?_s=oXllOyF0mU%2BY2Zya4BZ5dUOHgNI%3D

ADD REPLY
0
Entering edit mode

cited in : https://peerj.com/articles/9132/

Comparative analysis of four Zantedeschia chloroplast genomes: expansion and contraction of the IR region, phylogenetic analyses and SSR genetic diversity assessment

ADD REPLY

Login before adding your answer.

Traffic: 1211 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6