I am performing differential expression of miRNAs in sperm from genetically modified vs control mice following miRDeep2. Sperm data may contain fragmented reminiscent RNA and even RNA from bacteria that result in a low mapping percentage. Moreover, due to the fact that small RNA libraries can capture other types of small RNA (not only miRNAs) and piRNAs or tfRNA seem to be present in my data, the quantification of reads to miRNAs assigns a very low percentage that do not represent the sequenced library size. In this case, to perform normalization, should I consider the total library sequenced, as the % of assigned reads is less than 10%?
Neither. You make a count matrix containing only the RNA type you are looking for, and then consider the column sum the library size. Feed this into a sophisticated normalization such as edgeR or DESeq2. Raw library size correction is prone to errors due to compositional events. Explained here:
% of assigned reads eventually does not matter. The question is whether you have enough raw counts. If you have low percentage that just might mean that you need many more absolute number of reads to get sufficient raw counts for the RNAs you are looking for.
I disagree somewhat with this approach. For small RNA, there is a reasonable expectation of huge global differences between tissues and potentially from individual gene KOs (unknown what OP's "treatment" is). The ideal situation is to have a synthetic small-RNA spike-in - however this seems relatively rare in practice.
They could try sub-setting to specifically miRNA but they should compare to piRNA/endo-siRNA etc... to determine whether their changes are stable across normalization "sets" and/or if their expected biology explains the observations.
and
Was the library prep specifically done for miRNA (or small RNA in general) or is this a normal RNAseq dataset where you are trying to look for small RNA.
Yes, the library construction was performed for small RNA by selecting for fragments of 18bp-30bp and sequencing SE50.
Did the kit use a specific miRNA adapter that was ligated? Were 10% alignments for reads that contained this adapter. Or is the 10% a fraction from all reads that contained the adapter (meaning rest 90% did not align, which would indicate some issue with the lib prep).
I'm not sure about your question. The 3' adapters were removed in the first step of the workflow using miRDeep2 mapper which allows for removal of adapters and selection of minimum length.