I have a fasta file of about 200,000 RNA sequences and a server holding a local copy of Rfam. For each sequence I want to know the type of RNA it is most related to and ultimately retrieve statistics on the proportion of sequences that are most probably rRNAs, tRNAs, real sRNAs etc.
I'm sure this kind of thing has been done before plenty of times but I can't find much information on it myself. I was going to use BLAST for the job, retrieving the top scoring result from each output report using perl. My question is, is BLAST really capable of handling the local alignment of many very short (16 - 50nt) sequences in a reasonable amount of time?
I'll let others answer your direct BLAST question, but as a personal suggestion I would consider using MAFFT, which in fact is capable of doing what you describe.
That looks like a multiple alignment tool. The terminology is confusing here, my apologies. To clarify, I want to align each sequence in my RNA collection one after the other, find out the sequence they are most related to in the Rfam database, and extract information on that RNA's family to use in my statistics. Once I have a candidate family for each individual RNA, I can add up the number of RNAs in my collection that are part of each family and report that as percentages.
Soem more important questions. Is this a single organism RNA? Do you know which organism(s) are in the sample? Do you have a reference genome? If the answers are yes, then blast (and MAFFT anyway, you don't want to do MSA on 200.000 sequences!) is the wrong tool.