Python multiple sequence alignment
2
0
Entering edit mode
9.6 years ago
Nick Stoler ▴ 70

I'm trying to find a fast implementation of a multiple sequence alignment algorithm that I can use from Python. The requirements aren't big. I'll only be aligning a handful (like a dozen) 300bp reads which should be very similar to each other (they come from the same molecule).

But I have to do that about 2 million times, so I'd like to avoid using an external command like in BioPython. I think the overhead of writing a file to disk, creating a process, its disk I/O, and reading the results file might be significant when done for only a dozen reads each time?

tl;dr: Is there something like a ClustalW, MUSCLE, etc package for Python, written in C?

alignment • 4.8k views
ADD COMMENT
0
Entering edit mode

You can use these alignment programs in perl. Think about that if language doesn't matter.

ADD REPLY
0
Entering edit mode
9.6 years ago
Brice Sarver ★ 3.8k

That's a lot of MSAs! Are you sure you need true alignments for each of these? You may be able to get what you need without needing to do that. Perhaps mapping would be sufficient?

I'm assuming you want to keep everything in Python as opposed to writing intermediate files, right? I'm not aware of anything that handles such a task internally (but I'd be interested in hearing about it if you do find one). Biostrings, in R, does have the ability to perform alignments, as does DECIPHER and some other libraries. Check this out to start.

If you really need to do that many and you've confirmed that I/O is an issue, perhaps you are better off looking at different hardware alternatives. I've run more complicated tasks that included writing big results and log files hundreds of thousands of times using a distributed cluster without issue.

ADD COMMENT
0
Entering edit mode

That's an interesting question about whether it needs to be an MSA. I don't think it can be mapping, since the point is to be reference-free. Basically, I want to build a consensus out of each group of a handful of reads (it's duplex sequencing), independent of a reference.

ADD REPLY
0
Entering edit mode
9.6 years ago
wpwupingwp ▴ 120

I guess you may use bwa or bowtie instead.

ADD COMMENT

Login before adding your answer.

Traffic: 2261 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6