I have Illumina data of 1 x50bp from small RNA library. I am interested in identifying known and novel miRNAs in my plant, whose reference genome is not available. Here first I want to count the reads aligning to non-coding rRNAs like tRNA, rRNA, snoRNA, snRNA, siRNA etc except miRNAs. After removing those reads, I would like to align remaining reads with miRBase using blastn or bowtie to identify known miRNAs. Now problem is that, I want to use Rfam for detecting reads of different RNA class, thus I concatenated the fasta files of 2686 families (2,487,655 sequences) but Rfam v13 sequences does not have proper information of the class they represent in their header. So my concern is how can I count and extract the reads mapping to different classes of RNAs using Rfam.
helloo @toralmanvar
i dont know about the rfam issue but be careful while choosing blastn in your case. Blast is a heuristics and originally developed for quick database searche. it is not guaranteed to find all occurrences of a short sequence like miRNA; it will miss ~40% of possible hits when dealing with sequences of 20bp. also, it produce local alignments rather than end-to-end global matches. so i will not recommended. but here are few suggestionps
alos, read this paper
Actually I want to classify the ncRNAs like the table shown .