Entering edit mode
8.5 years ago
eyb
▴
270
I have a bunch of 500-600 bp sequences which I want to align. My goal is to get one file where all sequences would be aligned to the reference mitochondrial sequence. How do I do that using biopython? I figured out how to read, slice and dice, but I couldn't figure out how to make sth or aln (I guess that's what I need) file so I could find snps.
Do you want to perform a
multiple sequence alignment (MSA)
analysis?(aln format is typical for this analysis). I don't see what are you trying to do, and what it must be done inbiopython
. There are many tools available for performingMSA
, which have been widely tested, such as ClustalW.Anyway, if your final goal is to call snps and you have sequences of 500-600 bp length (which I guess that are long reads(?)), you should first align your sequences against your genome (mitochondrial in your case), using an alignment tool (see this post). Once the mapping has been done, you can try to carry out a
Variant Calling analysis
.Thanks. Links did not attach. Can you please edit your post?
EDIT it's ok now
Look into an aligner such as bwa or bowtie to align your reads to your mitochondrial sequence. Then, look into GATK from Broad Institute for calling SNPs. You'll also want to familiarize yourself the VCF format (look at 1000 genomes proejct).
If it must be done in Biopython, you can use its EMBOSS wrapper to run Smith-Waterman (mitochondrial sequence is small enough to fit the memory)
http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc84