Question

Differential Expression and absolute expression

0

Entering edit mode

11 weeks ago

jain72744 ▴ 10

I am working on an RNA-seq transcriptomic dataset. I have two ways to identify the important genes-

Differential expression using DESEQ2
Comparing absolute expression levels of tumor to normal samples

In the second case, I am aiming to determine the genes that are expressed in tumour samples over the maximum threshold of expression in normal samples to give me genes that lie outside the range of normal in more than 80% samples.

Both approaches give me different results. What would be better to implement? Let's discuss.

normal DEG tumour rna • 591 views

ADD COMMENT • link 11 weeks ago by jain72744 ▴ 10

2

Entering edit mode

Remember that RNA-seq data are compositional, and you can't treat the count data as absolute expression values.

ADD REPLY • link 11 weeks ago by andres.firrincieli 3.9k

score 1 · Answer 1 · 2025-02-18

1

Entering edit mode

11 weeks ago

Mensur Dlakic ★ 29k

I don't think there is much to discuss. Let's say there is a gene in your samples that is 20-50x upregulated, but its absolute expression is still below the maximum expression of arbitrary genes in the normal samples. Your second approach would exclude those genes, and I don't think many people would support that.

ADD COMMENT • link 11 weeks ago by Mensur Dlakic ★ 29k

0

Entering edit mode

The thing is that if I take one gene -> look for its expression in normal and tumour samples -> find that in 80% of tumor samples it is out of the range of normal samples -> doesn't it indicate that the genes is differentially expressed in so many tumor samples so there must be some reason for that. This way I am working it out for all genes. How is differential expression better as this way I am directly getting the genes that are relevant as per in vitro studies, which as per my data do not come with the criteria of log2 fold change> 2 and p-value<0.05.

ADD REPLY • link 11 weeks ago by jain72744 ▴ 10

2

Entering edit mode

Nobody can stop you from doing ad hoc and arbitrary definitions of differentially-expressed genes like this. Still, I would always recommend to stick with established analysis strategies if existing, and here it exists in the form of differential expression. You do not need a fold change filter. After all, lfc of 2 is a fold change of 4 -- it's log scale, remember that. That is very high, especially if the study is not well powered. The least common criterion is FDR < 0.05, and even this you could relax if there is a robust reason such as underpowerment and you want to use results for hypothesis generation. Just don't come up with these highly non-standard analysis methods, that just gets you into trouble. Ad hoc methods are untested, lack proper validation, do not control false-positive rate, ignore that variance of genes is not constant for different expression levels. Essentially, you throw away decades of biostats research and oversimplify. Don't.