Hi All. i need to check miRNA differential expression between two illumina GAII run lanes. for this, i summed up all the reads that aligned to each miRNA. and i now have 2 tables that specify each miRNA and the number of reads that aligned to it.
obviously a normalization needs to be done when comparing these two tables. my question is: normalization to what? should i use the total number of reads (mapped and unmapped) produced from each lane? or should i use the number of mapped reads? or the number of uniquely mapped reads?
im leaning towards the first option (normalizing the hit number in accordance to the total number of reads produced in each lane)
This article I found also supports using rpm but it is mentioned that RNA extraction methods alter this normalization results and may lower their validity
Exactly. That's what others told me about bias. The bias issue always seems to come up, but I haven't really seen a good suggestion how to deal with it.
A agree that the definition of rpm is more appropriate for the short miRNA than RPKM, which refers to 'kilobase of exon model' originally. There is possibly no gene-length normalization/scaling required for these short sequences.