Question

Which algorithm and stand-alone tool is good for searching short nt motif (30-60bp) against long sequences (300-500bp)?

0

Entering edit mode

9.7 years ago

GP ▴ 10

Hi All,

I want to search short nucleotide motif (30-60 bp) against millions of long sequences (400-500 bp), which algorithm and program/tool (standalone) would be good for me?

Thanks for any help!

alignment blast • 2.6k views

ADD COMMENT • link updated 9.7 years ago by Giovanni M Dall'Olio 28k • written 9.7 years ago by GP ▴ 10

1

Entering edit mode

I think you should refine your question. In particular: How many mismatches do you want to allow between motif and target? How many motif do you have, just a few say < 1000, or more like in the millions? Are you happy to know that a motif is present in a target sequence, or you want also the best alignment?

In the simplest case (no mismatches, few motif, and just look for presence/absence) something as easy as a grep command could do the job.

ADD REPLY • link 9.7 years ago by dariober 15k

0

Entering edit mode

Hi, basically I have one motif (query) that i want to align against each of long sequences (custom db of million reads) and I need standard tabular output like blast (%identity, alignment length etc)...I found that blast is not a good option..i probably need smith-waterman based tool, not sure though..

ADD REPLY • link 9.7 years ago by GP ▴ 10

0

Entering edit mode

If you write motif, do you mean a sequence or a motif with ambiguities?

ADD REPLY • link 9.7 years ago by lelle ▴ 830

0

Entering edit mode

thanks, I meant sequence..

ADD REPLY • link 9.7 years ago by GP ▴ 10

score 1 · Answer 1 · 2015-03-11

Hi- by now you might have found a solution to this question anyway...

A while ago a wrote a program, SequenceMatcher which might suite you. In your case you could do something like:

java -jar ~/path/to/SequenceMatcher.jar match -a motif.fa -b sequences.fa -aln local

The output is in a easy-to-parse tabular format or in SAM format. For 1 vs a 1M sequences it might be slow but not terrible.

score 1 · Answer 2 · 2015-03-11

For short motifs to be aligned to long sequences I got good results with MEME. You can download a stand-alone version and run it on a local computer. The suite has also some nice tools for working with motifs - e.g. search similar motifs in the JASPAR and Uniprobe database, find enrichments of multiple motifs, and so on. It depends on what you want to do.

If you are only interested in an alignment you can use exonerate, a nice command line tool that doesn't require much installation and has many options for different types of alignment. It is specially good if you need to align cDNA sequences to genomic DNA, because it has a model for handling large introns and identifying exon junctions. But it is good for other types of alignments as well.