Hello,
In our bulk mRNA-Seq data, about ~600 of our ~21,000 detected genes were miRNAs. All of these fall within the bounds of expression of non-miRNA genes, and about ~20 miRNAs fall within the upper half of gene expression in the dataset.
I was surprised by this because I thought most miRNAs would be removed via polyA selection. Also, we did NOT use a kit for small RNA capture and sequencing.
I'm wondering if these reads are aligning to a pre-miRNA that are longer than 75 basepairs. I'd like it if I could take my list of miRNAs in R and combine them with a database that has information about how long the pre- and post-processed miRNAs are, to sanity check my theory.
Does such an miRNA database exist? I'm trying to use mirbase.db, but I'm confused about its use, as well as whether it has the information I'm looking for: http://bioconductor.org/packages/release/data/annotation/html/mirbase.db.html
Thanks!
About poly-A:
Human microRNAs are processed from capped, polyadenylated transcripts that can also function as mRNAs
This is a great point. Thank you
I suggest you try RNACentral. Filter according to your needs (organism, type of RNA).
I'm a little confused...was your prep designed to catch miRNAs or not?
It was not designed to catch miRNAs, thus my confusion. It was a KAPA stranded mRNA-Seq kit: https://www.kapabiosystems.com/product-applications/products/next-generation-sequencing-2/rna-library-preparation-2/kapa-stranded-mrna-seq-kits/
You wrote "bulk miRNA-Seq" but I don't think you meant that. I think you need to scrutinize the library protocol, or talk to whoever prepped it, because I strongly suspect that this library prep is supposed to filter away small fragments, so you can't treat the things you think are short fragments as legitimate.
Thanks for catching that typo, fixed above. I'm wondering that too, whether pri-miRNAs are even a possibility in an mRNA-Seq prep.