Question

In Deep Sequencing Experiments: What Is Differential Expression?

8

Entering edit mode

14.2 years ago

Doctoroots ▴ 810

Hi all. my question simply put: lets say i want to perform differential expression (DE) analysis when faced with deep sequencing data for 2 samples (RNA/miRNA/transcript - Seq). what is the meaning of "differential expression"?

do i want to see if gene X's absolute expression is significantly different between samples? or do i want to see if gene X's relative expression (the gene's relative amount in oppose to the other genes in the sample) is significantly different?

when discussing this question with my lab's biologists, they all agree that they are interested in the gene's absolute expression change, and not the relative one. but when discussing this with other bioinformaticians, they tell me that the absolute expression could not be inferred from deep sequencing data, even after normalization.

i found this paper comparing different statistical methods for DE with qPCR. now since qPCR is a method that is used to evaluate the difference in absolute expression levels, my conclusion was that we want to normalize our DS data to be as closly correlated to the absolute expression difference and not the relative one.

this might feel like an obvious question, but i must say that when i tried to find a definite answer i was amazed that i couldnt.

so to sum up: what do you mean when you say differential expression? and, how do you prefer to normalize your data in order to correctly present this type of differential expression?

gene rna next-gen sequencing data • 9.7k views

ADD COMMENT • link updated 14.1 years ago by Marina Manrique ★ 1.3k • written 14.2 years ago by Doctoroots ▴ 810

score 2 · Answer 1 · 2011-02-28

2

Entering edit mode

14.2 years ago

Stefano Berri 4.4k

now since qPCR is a method that is used to evaluate the absolute expression levels[...]

qPCR gives you relative expression level. it is first relative to the "housekeeping" gene (there is no such thing, but keep it quite with your fellow biologists) and then it is relative across treatments/samples (typically you ask "is geneX expressed more in this or in that condition?"). You could compare the expression of geneX to a given amount (actual number of copies) of a plasmid and come to a conclusion like "my geneX has 12.5 times as many copy as my plasmid" but this information is totally irrelevant as it depends on the amount RNA/cDNA that went into the reaction. Basically, you don't know how many cells you are looking at.

You definitely want to compare relative gene expression. It is all due to the fact that it is usually not known how many cells the RNA/cDNA is coming from and, even if you knew, intermediate steps (isolation, retrotrascription, etc etc) swing ratios around. So what happens in the lab is that they try to put similar amount of total cDNA and then compare to a "housekeeping" gene or, if you have many genes, to the median expression (pretty much as it happens for microarray - RNA-seq)

You start with one assumption: if you looking into sample A and sample B you ASSUME the TOTAL amount of RNA these two samples produce is the same. This is a reasonable assumption. I argued many time with people to see if we can get around it, but we never succeeded.

Think about it.

It is normal that the biologists think about absolute number, but if you think about it, you can't get the absolute number and you should be able to convince them. And it is important you succeed in convincing them.

ADD COMMENT • link 14.2 years ago by Stefano Berri 4.4k

0

Entering edit mode

hi stefano, perhaps i wasnt clear in my question. i know the absolute number cant be concluded, but my question was: does differential expression mean gene X's absolute (unknown) counts difference between samples or does it mean the difference in gene X's relative amount (relative to the other gene's)?

ADD REPLY • link 14.2 years ago by Doctoroots ▴ 810

0

Entering edit mode

Hi. Still relative. If you find that geneX is upregulated in condition A, it means that the proportion of molecules from geneX compared to the rest of the transcriptome is higher in condition A than condition B. If the assumption that the total RNA production is the same in condition A and conditon B is true, than also the "absolute" expression is higher.

ADD REPLY • link 14.2 years ago by Stefano Berri 4.4k

0

Entering edit mode

So what your saying is that we test our null hypothesis of no differential expression only under the assumption of a similar total RNA production? this means that when dealing with cases where there is a decreased amount of total RNA in one of the samples, we cannot perform DE?

ADD REPLY • link 14.2 years ago by Doctoroots ▴ 810

0

Entering edit mode

No, I am saying that you normalize in such a way that the original RNA are the same. You find that geneX represents 0.01% of total RNA in condition A but 0.02% in condition B, and then check if the log2(ratio) is different from zero. But this has some implications: for instance if a gene goes up, other must go down. Because you can't know how much RNA each cell was producing, you cannot know the absolute values. But if you are comparing two similar things it is fine. If you compare liver vs skin, it might be problematic...

ADD REPLY • link 14.2 years ago by Stefano Berri 4.4k

0

Entering edit mode

first, thank you for your answers, second, just to make sure i understand : in an experiment where i have 2 samples, and one of them has an overall lower level of transcripts, but the relative amount of each transcript remains the same, i cant know this by differential expression of the 2 samples? (qPCR can be informative in this situation)

ADD REPLY • link 14.2 years ago by Doctoroots ▴ 810

0

Entering edit mode

That's right. But I doubt qPCR would tell you. if you compare to housekeeping it wouldn't show, if you compare with "absolute" reference (a plasmid), you still have to decide how much starting solution to use.

ADD REPLY • link 14.2 years ago by Stefano Berri 4.4k

score 2 · Answer 2 · 2011-02-28

Hi

I agree with saying RNA-Seq does not allow you to assess absolute level of expression (even though some people used this argument during the early establishment of this technology).

As Stefano mentioned you should know the exact number ofcells and also the cell volumes in order to determine the real absolute level of expression. Also, assessment of this level of expression generates some technical noise, especially for lowly transcribed genes.

In experiments where you compare a same sample in different conditions the processing is not that difficult since you can assume a similar biological background. Then, normalization can be done scaling the different samples to a same median value for example (as it was sometimes done in microarrays indeed). Scaling samples to a same median should not be affected by small number of genes changing in expression but, depending on your data, the normalization issue should be considered with more caution (distribution of RPKMs, read coverage,...).

The definition of differential expression may vary. Some use some advanced statistical methods to define a set of genes significantly changing in expression between two conditions while others simply base their approach on some thresholds. I'm no expert in this field since I did not use a lot those different methods, other members might be more helpful there...

score 1 · Answer 3 · 2011-03-01

1

Entering edit mode

14.2 years ago

Marina Manrique ★ 1.3k

Hi,

I would say that most of analysis of differential expression using RNA-seq use normalized gene expression levels (normally RPKM) to compare the gene expression between samples. We normally use the edgeR Bioconductor package for this kind of analysis, I'd recommend to read the case study "8 Case Study: RNA-seq data" in the edgeR user guide

Hope it helps :)

ADD COMMENT • link 14.2 years ago by Marina Manrique ★ 1.3k

0

Entering edit mode

hi Mariana, can you specify what you mean by "compare gene expression"? do you refer to its relative amount among other genes or its absolute amount? in order to correctly normalize the read count, we first need to decide what it is we want to test.

ADD REPLY • link 14.2 years ago by Doctoroots ▴ 810

0

Entering edit mode

I mean that absolute gene expression (the number of reads mapping to a gene) is not normally used to see if the gene 'A' (for example) is differentially expressed between 2 samples. Normally you need to normalize the abs gene expression with the total number of reads in the sample and with the gene length (that's what RPKM stands for). I didn't mean to calculate relative expression level among other genes in the sample. I'm not sure if in these kinds of experiments the expression of the genes is compared with the expression of other genes in the same sample as done with array data...

ADD REPLY • link 14.2 years ago by Marina Manrique ★ 1.3k

0

Entering edit mode

hi Mariana, i know of RPKM and other normalization methods. what i fail to comprehend is what do all these normalization methods aim to do? to find if the absolute gene's amount is different (like qPCR) or the relative one?

ADD REPLY • link 14.2 years ago by Doctoroots ▴ 810

0

Entering edit mode

I think those normalization methods allow to compare the expression of a certain gene no matter the depth of the sample or the length of the gene. If you take only the abs level (the number of reads mapping to the gene) longer genes would seem to be more expressed than shorter ones. Besides, genes in samples that have more reads would look like if their expression level is higher. I don't know if I'm clear... To sum up I think they allow to compare absolute expression level

ADD REPLY • link 14.2 years ago by Marina Manrique ★ 1.3k