Hi,
Here is a case scenario that happens quite often to me: I need to blast from 1,000 to 20,000 sequences in order to find the proteins these sequences code for. These sequences come from fish cDNA libraries, so I expect most of them, although not all, to code for proteins.
I presently use 'blastplus' locally to query both swissprot and nr, but this approach is not so satisfactory for a few reasons:
- It is very slow (up to a few days for nr)
- I would also like to query nt (I did not succeed, it took much too long)
- With a faster method, I would consider blasting a number of sequences a few orders of magnitudes higher
I was investigating the Usearch set of tools, but the ublast method cannot do the equivalent of a blastx, searching for nucleotide hits on a protein database.
What method would you suggest?
Cheers!
Have you tried translating each cDNA sequence into protein and then just use the longest ORF to BLAST? - this should speed things up about 3 times already.