I have a file containing millions of FASTA protein sequences from more than 2000 species. I'm looking for an efficient way (faster than BLAST) to retrieve protein's ID for a given amino-acid sequence. I know that blastdbcmd can pull out an individual sequence record from the BLAST database based on given sequence identifier, but it doesn't work for querying sequences.
Do you know any tools that skip the "alignment building step" and allow for fast retrieval of a FASTA record based on its sequence?
Probably faster with
"^QWERTY$"
but if the aim is to do this for multiple sequences, there are way faster alternatives still that probably require indexing..