Alternative to blast, faster but less sensitive?
2
0
Entering edit mode
10.6 years ago

I am looking for an alternative to blast that can search the UniProt sequence database in a faster way by losing some sensitivity. I am interesting in finding only relatively close homolog sequences, say >~ 50% sequence identity. Thus, sensitivity is not an issue. I thought it could be possible to do that with blast by tuning some parameters like word length and so on but I haven't gotten anywhere. I've come across USEARCH that claims better performance but could not try it out since one needs a license for it.

Any ideas around?

sequence blast • 4.0k views
ADD COMMENT
0
Entering edit mode

The license for the 32 bit USEARCH is free and will almost certainly cover your needs. As to other blast alternatives, you could check out HMMER (and pfamscan). You wouldn't be searching against UniProt though but hmmprofiles. Depending on your research question, it can easily be the better option..

ADD REPLY
0
Entering edit mode

I actually had a go with the 32bit USEARCH, but unfortunately 32 bit is an issue these days due to the size of the sequence databases. You need more than 4GB of memory (i.e. you need 64 bit executable) to go through the whole UniProt database or even to do only a fraction of it.

ADD REPLY
0
Entering edit mode

UniProt is actually a rather small database, I think UniRef100 has just 35M entries. I'm almost certain that the 32 bit USEARCH binaries should handle it just fine. The 64-bit licensees listed at the USEARCH site deal with databases that are orders of magnitudes larger. Did you read the manual?

ADD REPLY
0
Entering edit mode

I did this test sometime ago, using a UniRef100 from 2009 (i.e. a lot smaller than current ones):

$ ./usearch7.0.959_i86linux32 -makeudb_usearch uniprot_trembl.fasta -output uniprot_trembl.udb
usearch v7.0.959_i86linux32, 4.0Gb RAM (49.4Gb total), 12 cores
(C) Copyright 2013 Robert C. Edgar, all rights reserved.
http://drive5.com
00:00  19Mb Reading input
00:18 942Mb  100.0% Masking
00:43 955Mb  100.0% Word stats
00:44 2.9Gb   57.2% Building slots
Out of memory mymalloc(1032), curr 4.14e+09 bytes

myutils.cpp(2136): 

./usearch7.0.959_i86linux32 -makeudb_usearch uniprot_trembl.fasta -output uniprot_trembl.udb

---Fatal error---
Out of memory, mymalloc(1032), curr 4.14e+09 bytes
ADD REPLY
0
Entering edit mode

I see you didn't read the manual page I linked to the previous post.

ADD REPLY
0
Entering edit mode

Well I'm sorry but I still don't see where I'm going wrong (I did go through the manual by the way). Do I not need to run the makeudb_search first? The manual says "A database file must be specified using the ‑db option. FASTA and .udb formats are supported. For large databases, .udb format is recommended (see makeudb_usearch command)." For completeness I've just run this command, with similar memory issues:

$ ./usearch7.0.959_i86linux32 -usearch_global O00560.fa -db uniprot_trembl.fasta -id 0.8 -alnout results.aln
usearch v7.0.959_i86linux32, 4.0Gb RAM (49.4Gb total), 12 cores
(C) Copyright 2013 Robert C. Edgar, all rights reserved.
http://drive5.com
00:15 3.6Gb  100.0% Reading uniprot_trembl.fasta
01:09 3.6Gb  100.0% Masking                     
02:43 3.6Gb  100.0% Word stats
02:43 3.6Gb    0.0% Building slots
Out of memory mymalloc(6632), curr 4.16e+09 bytes

myutils.cpp(2136): 

./usearch7.0.959_i86linux32 -usearch_global O00560.fa -db uniprot_trembl.fasta -id 0.8 -alnout results.aln

---Fatal error---
Out of memory, mymalloc(6632), curr 4.16e+09 bytes

Now that I'm reading other threads in biostars (this I didn't do before) I'm seeing that other users are having similar issues with memory, see this reply: C: Looking For Faster Blastp-Like Program?

So maybe I should change my question: are there any positive experiences with usearch? or could someone provide feedback on how does usearch work in real-life? I would definitely get a license for it, but first I have to check that it does what I need.

ADD REPLY
0
Entering edit mode

See also this comment A: How To Solve An Out-Of-Memory Error When Using Usearch Chimera Detection with regards to memory problems and 32 bit usearch

ADD REPLY
2
Entering edit mode
10.6 years ago
cdsouthan ★ 1.9k

BLAT is obvious choice for blistering speed but don't know feasibiity of implementation

ADD COMMENT
0
Entering edit mode

agree with u

ADD REPLY
1
Entering edit mode
9.2 years ago

Try PAUDA.

ADD COMMENT

Login before adding your answer.

Traffic: 2827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6