Hi all. my question simply put: lets say i want to perform differential expression (DE) analysis when faced with deep sequencing data for 2 samples (RNA/miRNA/transcript - Seq). what is the meaning of "differential expression"?
do i want to see if gene X's absolute expression is significantly different between samples? or do i want to see if gene X's relative expression (the gene's relative amount in oppose to the other genes in the sample) is significantly different?
when discussing this question with my lab's biologists, they all agree that they are interested in the gene's absolute expression change, and not the relative one. but when discussing this with other bioinformaticians, they tell me that the absolute expression could not be inferred from deep sequencing data, even after normalization.
i found this paper comparing different statistical methods for DE with qPCR. now since qPCR is a method that is used to evaluate the difference in absolute expression levels, my conclusion was that we want to normalize our DS data to be as closly correlated to the absolute expression difference and not the relative one.
this might feel like an obvious question, but i must say that when i tried to find a definite answer i was amazed that i couldnt.
so to sum up: what do you mean when you say differential expression? and, how do you prefer to normalize your data in order to correctly present this type of differential expression?
hi stefano, perhaps i wasnt clear in my question. i know the absolute number cant be concluded, but my question was: does differential expression mean gene X's absolute (unknown) counts difference between samples or does it mean the difference in gene X's relative amount (relative to the other gene's)?
Hi. Still relative. If you find that geneX is upregulated in condition A, it means that the proportion of molecules from geneX compared to the rest of the transcriptome is higher in condition A than condition B. If the assumption that the total RNA production is the same in condition A and conditon B is true, than also the "absolute" expression is higher.
So what your saying is that we test our null hypothesis of no differential expression only under the assumption of a similar total RNA production? this means that when dealing with cases where there is a decreased amount of total RNA in one of the samples, we cannot perform DE?
No, I am saying that you normalize in such a way that the original RNA are the same. You find that geneX represents 0.01% of total RNA in condition A but 0.02% in condition B, and then check if the log2(ratio) is different from zero. But this has some implications: for instance if a gene goes up, other must go down. Because you can't know how much RNA each cell was producing, you cannot know the absolute values. But if you are comparing two similar things it is fine. If you compare liver vs skin, it might be problematic...
first, thank you for your answers, second, just to make sure i understand : in an experiment where i have 2 samples, and one of them has an overall lower level of transcripts, but the relative amount of each transcript remains the same, i cant know this by differential expression of the 2 samples? (qPCR can be informative in this situation)
That's right. But I doubt qPCR would tell you. if you compare to housekeeping it wouldn't show, if you compare with "absolute" reference (a plasmid), you still have to decide how much starting solution to use.