Fastest Way To Search For Perfect Matches Only In Blast Or Blat
1
4
Entering edit mode
11.5 years ago
Sangwoo Kim ▴ 440

Hi, I have a numerous 100-mer sequences (let's say billions). What I am going to do is to query these sequences to entire human genome to find "perfect matches" only.

I first tried to do this using BLAST+ (megablast). I constructed blastdb and index using hg19. I gave below options to benefit from allowing perfect matches only

blastn -query myseqs.fa -db hg19 -use_index true -index_name hg19_index -word_size 100 -outfmt "7 qacc qstart qend sacc sstart send sstrand" -max_target_seqs 5 -num_threads 4

Here, I gave word size '100' to achieve my goal. It does retrieve only perfect matches. But the problem is the speed, which is about a million queries per an hour. Well, someone can say this is fast enough, but I want it to be faster!

On the other hand, I could use BLAT instead of BLAST, which is generally accepted as a faster tool. I also constructed my local BLAT server (gfServer and gfClient), but I am not sure how to control BLAT parameters to get only perfect matches.

So, what would be the fastest way to retrieve perfect matches in BLAST/BLAT?

sequence alignment blat blast+ • 6.0k views
ADD COMMENT
2
Entering edit mode
11.5 years ago
Sangwoo Kim ▴ 440

Answering to my own question. I just solved this problem by using "bwa fastmap". Hope this was helpful to anybody with the same problem.

ADD COMMENT
1
Entering edit mode

fastmap still does more than your need. In principle, we can have something several times faster than fastmap for your task.

ADD REPLY
0
Entering edit mode

Could you give me a few examples? I tried SSAHA but it was much slower than fastmap.

ADD REPLY
0
Entering edit mode

No, ssaha2 won't do. You need a new but very simple aligner to align a read in full length only.

ADD REPLY

Login before adding your answer.

Traffic: 2317 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6