I have a conceptual question that I was hoping someone could answer.
Can I say that microRNA A is expressed x-fold greater than microRNA B directly from the TCGA miRseq data? Can I do this after normalizing the data? Does it matter if I use RSEM or RPKM values. It seems to me that it should be legitimate in any case since microRNAs are approximately the same length, but maybe I am overlooking something.
For example, I am following a paper published in Nature Communications entitled "Identification of a pan-cancer oncogenic microRNA superfamily anchored by a central core seed motif". The authors download the data and collapse isoform reads to a single read count using the reads. They say they used the reads per million microRNAs mapped, which establishes each microRNA read count as a fraction of the total microRNA population. The authors then do upper quartile normalization which they say is important because a subset of microRNAs (miR-143 in particular) contributes so significantly to the total read count. In the text, the authors appear to use the resulting values to do a direct comparison between microRNAs.
I definitely want the collapsed isoforms, and I think it makes sense to do the normalization. However, I would like to say that a particular microRNA is expressed x-fold higher than another. Can I do this from the collapsed and normalized data?
If this has already been answered, I apologize. I could not find it. Thanks.
Thanks PyPerl for answering!
Are you suggesting that I first normalize the data (e.g. by RPKM or upper quantile normalization) and then use some R bioconductor package to calculate the fold-change based on the normalized data? Would I be using the package to compare microRNA A versus microRNA B for fold-change instead of doing differential expression analysis between conditions (e.g. cancer versus normal) for a single microRNA? I have done the latter, but I had not thought about doing the former. Thanks again!
Get the counts by aligning the query sequences to reference sequences and normalize it into RPKM or any other method of choice. Then use any R package for differential expression analysis. Then you can easily compare by looking at fold change and P-value and/or q-value for candidate miRNA. This is the preferred method for differential expression analysis for two conditions. You can google the package that calculate counts and normalize the data for differential expression.