Question

comparing RSEM and counts data

0

Entering edit mode

10.5 years ago

Angel ▴ 220

Hi,

I have a RSEM data for a tumor type and I need to compare it gene COUNTS data. Is there a methodology to compare RSEM vs counts?

Thanks

RNA-Seq Counts RSEM • 11k views

ADD COMMENT • link updated 3.1 years ago by Ram 44k • written 10.5 years ago by Angel ▴ 220

Ram · Answer 1 · 2014-06-13

1

Entering edit mode

10.5 years ago

Charles Warden 8.3k

Why do you need to compare the two results? Some differential expression tools will compare gene-counts, but this is expected to differ from normalized expression levels (for example, longer genes should have relatively larger counts).

Also, the annotations can affect the normalized expression. For example, having reads align to multiple transcripts could be an issue, depending upon how you processed your data. Here is a link to a blog post to illustrate this:

http://cdwscience.blogspot.com/2014/04/differential-expression-without.html

However, the secondary message from that blog post is that the popular methods (e.g. cufflinks, RSEM) for quantifying gene-level expression are pretty robust. So, the RSEM mRNA quantification should be fine (and if you wanted to compare it to something, you should compare it to other mRNA quantification methods, not raw counts). My personal preference is to just to work with the RPKM/FPKM/TPM normalized expression values, and not worry about the raw counts.

ADD COMMENT • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by Charles Warden 8.3k

0

Entering edit mode

Hi Charles,

Please look at my reply to Devon as well. I have pre-normalized dataset for a tumor type which is RSEM on one hand (dataset1) and I do have counts and corresponding RPKM data set for another set of samples (dataset2).

The problem at hand is to compare expression of a couple of genes from these two different datasets and different normalizations. I thought since RSEM doesn't take into account gene length, it will be more relevant to compare it to counts data, not RPKM. But I don't know the methodology.

ADD REPLY • link updated 10.5 years ago by Devon Ryan 104k • written 10.5 years ago by Angel ▴ 220

0

Entering edit mode

To be clear, RSEM is an algorithm, not a unit. In fact, I'm pretty sure that the RSEM is providing RPKM as the normalized expression values (which are corrected for gene lengths) along with other metrics (such a raw counts).

Independent of the mRNA quantification method / metric, there will probably be batch effects between the two datasets (especially if there are differences in the sample preparation). If you have members of the same group (say, tumor versus normal) in both datasets, you can correct for batch effects with something like a 2-way ANOVA. Otherwise, the interpretation will be tricky, no matter what.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by Charles Warden 8.3k

Ram · Answer 2 · 2014-06-13

0

Entering edit mode

10.5 years ago

Devon Ryan 104k

Just use featureCounts or htseq-count to get the per-gene counts, load things into R, sort so they're in the same order, and then compare.

ADD COMMENT • link 10.5 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon,

I am sorry I don't know what you mean. I do not have raw data to work with. I only have RSEM pre-normalized data for a tumor type and only gene counts data (and RPKM) from another data set which refers to normal brain. I am supposed to compare gene expression for few genes from these two different datasets.

My question was how can I compare RSEM vs. counts and now I am adding, how can I compare RSEM vs RPKM if not possible to correlate RSEM vs. counts directly.

I don't know how featurecounts supposed to help me.

ADD REPLY • link updated 4.9 years ago by Ram 44k • written 10.5 years ago by Angel ▴ 220

0

Entering edit mode

Is this data you downloaded from TCGA? If so, you should have mentioned that you don't have the data, just a table of RSEM counts.

ADD REPLY • link 10.5 years ago by Devon Ryan 104k