Sequence alignment in biopython
0
1
Entering edit mode
8.5 years ago
eyb ▴ 270

I have a bunch of 500-600 bp sequences which I want to align. My goal is to get one file where all sequences would be aligned to the reference mitochondrial sequence. How do I do that using biopython? I figured out how to read, slice and dice, but I couldn't figure out how to make sth or aln (I guess that's what I need) file so I could find snps.

biopython sequence • 4.2k views
ADD COMMENT
2
Entering edit mode

Do you want to perform a multiple sequence alignment (MSA) analysis?(aln format is typical for this analysis). I don't see what are you trying to do, and what it must be done in biopython. There are many tools available for performing MSA, which have been widely tested, such as ClustalW.

Anyway, if your final goal is to call snps and you have sequences of 500-600 bp length (which I guess that are long reads(?)), you should first align your sequences against your genome (mitochondrial in your case), using an alignment tool (see this post). Once the mapping has been done, you can try to carry out a Variant Calling analysis.

ADD REPLY
1
Entering edit mode

Thanks. Links did not attach. Can you please edit your post?

EDIT it's ok now

ADD REPLY
1
Entering edit mode

Look into an aligner such as bwa or bowtie to align your reads to your mitochondrial sequence. Then, look into GATK from Broad Institute for calling SNPs. You'll also want to familiarize yourself the VCF format (look at 1000 genomes proejct).

ADD REPLY
0
Entering edit mode

If it must be done in Biopython, you can use its EMBOSS wrapper to run Smith-Waterman (mitochondrial sequence is small enough to fit the memory)

http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc84

ADD REPLY

Login before adding your answer.

Traffic: 2332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6