I have a dataset of small RNA-seq reads that have their 3-prime adapter sequence trimmed such that the remaining seqeunce precisely represents the bases present in the biological fragment that was sequenced. Many of these sequences represent mature miRNAs, and I want to count how many times each known miRNA sequence occurs in the dataset. Obviously I can perform an exact string match and get a good estimate, but that ignores the possibility of sequencing errors and imprecise end cleavage. So I think I want to run an alignment of all my reads against every known human mature miRNA sequence allowing for gaps and mismatches and pick the best match for each one (if there is any match good enough). Does anyone know of a fast and efficient way to do this, preferable using R?
I wonder about aligning the reads to precursor microRNA. How will u get to know whether the mirna is 3p or 5p?