Question

Fastest Way To Search For Perfect Matches Only In Blast Or Blat

4

Entering edit mode

12.2 years ago

Sangwoo Kim ▴ 440

Hi, I have a numerous 100-mer sequences (let's say billions). What I am going to do is to query these sequences to entire human genome to find "perfect matches" only.

I first tried to do this using BLAST+ (megablast). I constructed blastdb and index using hg19. I gave below options to benefit from allowing perfect matches only

blastn -query myseqs.fa -db hg19 -use_index true -index_name hg19_index -word_size 100 -outfmt "7 qacc qstart qend sacc sstart send sstrand" -max_target_seqs 5 -num_threads 4

Here, I gave word size '100' to achieve my goal. It does retrieve only perfect matches. But the problem is the speed, which is about a million queries per an hour. Well, someone can say this is fast enough, but I want it to be faster!

On the other hand, I could use BLAT instead of BLAST, which is generally accepted as a faster tool. I also constructed my local BLAT server (gfServer and gfClient), but I am not sure how to control BLAT parameters to get only perfect matches.

So, what would be the fastest way to retrieve perfect matches in BLAST/BLAT?

sequence alignment blat blast+ • 6.4k views

ADD COMMENT • link 12.2 years ago by Sangwoo Kim ▴ 440

score 2 · Answer 1 · 2013-05-17

2

Entering edit mode

12.2 years ago

Sangwoo Kim ▴ 440

Answering to my own question. I just solved this problem by using "bwa fastmap". Hope this was helpful to anybody with the same problem.

ADD COMMENT • link 12.2 years ago by Sangwoo Kim ▴ 440

1

Entering edit mode

fastmap still does more than your need. In principle, we can have something several times faster than fastmap for your task.

ADD REPLY • link 12.2 years ago by lh3 33k

0

Entering edit mode

Could you give me a few examples? I tried SSAHA but it was much slower than fastmap.

ADD REPLY • link 12.2 years ago by Sangwoo Kim ▴ 440

0

Entering edit mode

No, ssaha2 won't do. You need a new but very simple aligner to align a read in full length only.

ADD REPLY • link 12.2 years ago by lh3 33k