Question

Sequence alignment in biopython

1

Entering edit mode

8.5 years ago

eyb ▴ 270

I have a bunch of 500-600 bp sequences which I want to align. My goal is to get one file where all sequences would be aligned to the reference mitochondrial sequence. How do I do that using biopython? I figured out how to read, slice and dice, but I couldn't figure out how to make sth or aln (I guess that's what I need) file so I could find snps.

biopython sequence • 4.2k views

ADD COMMENT • link updated 8.3 years ago by Biostar 20 • written 8.5 years ago by eyb ▴ 270

2

Entering edit mode

Do you want to perform a multiple sequence alignment (MSA) analysis?(aln format is typical for this analysis). I don't see what are you trying to do, and what it must be done in biopython. There are many tools available for performing MSA, which have been widely tested, such as ClustalW.

Anyway, if your final goal is to call snps and you have sequences of 500-600 bp length (which I guess that are long reads(?)), you should first align your sequences against your genome (mitochondrial in your case), using an alignment tool (see this post). Once the mapping has been done, you can try to carry out a Variant Calling analysis.

ADD REPLY • link 8.5 years ago by iraun 6.2k

1

Entering edit mode

Thanks. Links did not attach. Can you please edit your post?

EDIT it's ok now

ADD REPLY • link 8.5 years ago by eyb ▴ 270

1

Entering edit mode

Look into an aligner such as bwa or bowtie to align your reads to your mitochondrial sequence. Then, look into GATK from Broad Institute for calling SNPs. You'll also want to familiarize yourself the VCF format (look at 1000 genomes proejct).

ADD REPLY • link 8.5 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

If it must be done in Biopython, you can use its EMBOSS wrapper to run Smith-Waterman (mitochondrial sequence is small enough to fit the memory)

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc84

ADD REPLY • link 8.3 years ago by zwdzwd ▴ 120