Hi,
I have a RSEM data for a tumor type and I need to compare it gene COUNTS data. Is there a methodology to compare RSEM vs counts?
Thanks
Hi,
I have a RSEM data for a tumor type and I need to compare it gene COUNTS data. Is there a methodology to compare RSEM vs counts?
Thanks
Why do you need to compare the two results? Some differential expression tools will compare gene-counts, but this is expected to differ from normalized expression levels (for example, longer genes should have relatively larger counts).
Also, the annotations can affect the normalized expression. For example, having reads align to multiple transcripts could be an issue, depending upon how you processed your data. Here is a link to a blog post to illustrate this:
http://cdwscience.blogspot.com/2014/04/differential-expression-without.html
However, the secondary message from that blog post is that the popular methods (e.g. cufflinks, RSEM) for quantifying gene-level expression are pretty robust. So, the RSEM mRNA quantification should be fine (and if you wanted to compare it to something, you should compare it to other mRNA quantification methods, not raw counts). My personal preference is to just to work with the RPKM/FPKM/TPM normalized expression values, and not worry about the raw counts.
Just use featureCounts or htseq-count to get the per-gene counts, load things into R, sort so they're in the same order, and then compare.
Hi Devon,
I am sorry I don't know what you mean. I do not have raw data to work with. I only have RSEM pre-normalized data for a tumor type and only gene counts data (and RPKM) from another data set which refers to normal brain. I am supposed to compare gene expression for few genes from these two different datasets.
My question was how can I compare RSEM vs. counts and now I am adding, how can I compare RSEM vs RPKM if not possible to correlate RSEM vs. counts directly.
I don't know how featurecounts supposed to help me.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Hi Charles,
Please look at my reply to Devon as well. I have pre-normalized dataset for a tumor type which is RSEM on one hand (dataset1) and I do have counts and corresponding RPKM data set for another set of samples (dataset2).
The problem at hand is to compare expression of a couple of genes from these two different datasets and different normalizations. I thought since RSEM doesn't take into account gene length, it will be more relevant to compare it to counts data, not RPKM. But I don't know the methodology.
To be clear, RSEM is an algorithm, not a unit. In fact, I'm pretty sure that the RSEM is providing RPKM as the normalized expression values (which are corrected for gene lengths) along with other metrics (such a raw counts).
Independent of the mRNA quantification method / metric, there will probably be batch effects between the two datasets (especially if there are differences in the sample preparation). If you have members of the same group (say, tumor versus normal) in both datasets, you can correct for batch effects with something like a 2-way ANOVA. Otherwise, the interpretation will be tricky, no matter what.