Question

Absolute vs. differential gene expression - which analysis is more informative?

1

Entering edit mode

4 months ago

bioinfo1990 ▴ 10

Hi,

I have a fundamental question regarding gene expression analysis.

I’m working with a cell line that has undergone many generations, accumulating mutations and changes over time, making it very different from the original cells. I have RNA-Seq data for both the evolved cells and the original ones.

When analyzing the expression of specific genes, I see two possible approaches:

Absolute Expression Analysis – Normalizing gene expression (e.g., using RPKM or another method) and examining absolute expression levels in my current cells. This allows me to see which genes are expressed (e.g., above 2) and identify, for example, the top 100 most highly expressed genes.

Differential Expression Analysis (DEG) – Comparing my current cells to the original ones to determine fold changes in gene expression. For example, I could check whether a specific gene is expressed X times higher in my current cells compared to the original cells.

The issue I see with both methods is as follows:

Absolute Expression Analysis: This approach gives me a list of expressed genes, but what can I actually conclude from that? A gene with an expression level of 3 might have a significant biological effect, while another gene with an expression level of 4 might not. Also, is a given expression level sufficient to produce enough protein to impact its pathway in the cell? It feels like this method mainly answers whether a gene is expressed and whether it’s among the highest expressed in the cell.

DEG Analysis: A gene with a fold change of 3x compared to the original cells likely has some impact, but I don’t really care about the original cells. I haven’t worked with them for a year, and their past state isn’t relevant to my current research.

Often, I analyze the expression of a specific set of genes, such as those involved in the TCA cycle, or focus on a single gene of interest. Given these concerns, what would be the best approach to analyze gene expression in my current cells? How can I determine which genes are functionally relevant beyond just looking at absolute expression or fold change compared to the original cells?

Thanks you so much.

Differential-gene-expression Gene-expression RNA-Seq • 971 views

ADD COMMENT • link updated 4 months ago by LauferVA 4.8k • written 4 months ago by bioinfo1990 ▴ 10

1

Entering edit mode

Its worth noting that what you call here "Absolute Expression" is not absolute expression. The statistic you get - TPM/RPKM etc is still relative. Its just relative to the total amount of RNA in the sample.

People often ask "What does TPM correspond to in terms of number of mRNAs per cell", and the truth is that this is impossible to answer. A |TPM of 1 tells you that of every million transcripts, 1 will be from that gene. But to know how many transcripts form that gene there are in a cell, you'd therefore need to know how many transcripts molecules in the cell in total, from all genes, and that is generally not known, and varies from cell type to cell type, condition to condition and even cell to cell within a population.

ADD REPLY • link 4 months ago by i.sudbery 22k

0

Entering edit mode

I don't think you will find a single person on this forum who will advocate for absolute expression analysis. In fact, almost the same question was asked recently and you may want to read that thread.

ADD REPLY • link 4 months ago by Mensur Dlakic ★ 29k

score 1 · Answer 1 · 2025-03-24

One tends to use absolute expression metrics for quality control and to guarantee statistical power - i.e., to ensure that only genes with sufficient expression are considered. One tends to apply differential expression analysis to detect significant regulatory shifts (even if the baseline is, as in this case, historical).

Why? Return to the assay itself. Exactly what is RNA sequencing? How, exactly, is it performed? Understand this and we will understand why subtle variations in a step of the RNA‑Seq workflow mean significant differences in absolute expression measurements:

1. Sample Handling and RNA Extraction: Even minor inconsistencies in how samples are collected, stored, or processed can reduce RNA quality and yield, setting the stage for skewed expression values.

2. Reverse Transcription Efficiency: Small differences in the efficiency of converting RNA to cDNA can lead to inconsistent representation of transcripts, affecting downstream quantification.

3. Library Preparation (Fragmentation, Adapter Ligation, PCR Amplification): Variability during fragmentation or adapter ligation may unevenly capture transcripts, while slight biases during PCR amplification can disproportionately boost certain fragments, all of which can distort the true abundance of transcripts.

4. Sequencing Platform and Batch Effects: Minor variations in sequencing runs or machine performance can further amplify these initial discrepancies, leading to large differences in the reported absolute expression levels.

The use and popularity of differential expression analysis is best understood as one of many techniques needed to get usable information about two conditions in a setting of such high variability, and not even the most important one ...

Again, the idea is that while the variability that can emerge from these steps is a challenge, if you:

A) Rigorously design an experiment controlled for everything but the condition

B) Process both samples the same

C) Use QC procedures to "deal with" differences in RNA integrity, library complexity, read quality, and so forth.

D) Additionally perform sophisticated within and between sample normalization procedures (like those found in DESeq or EdgeR)

E) Deal with or remove batch effects, remove outliers, and control for covariates during statistical testing

F) Analyze Differential Expression judiciously

you may be able to draw reasonable conclusions about gene expression differences between two conditions. But you aren't going to get there through absolute metrics.

In summary:

the sensitivity of the process of sequencing itself to subtle conditions essentially precludes the use of of absolute expression metrics for anything but basic QC.
some use them also to cut down on multiple testing penalties by removing before hand any genes whose counts are too low to offer hope of a reliable statistical comparison.

Even then, keep in mind RNA-seq results are incomplete, tend not to correlate with other metrics of interest reliably (e.g., RNA-seq doesn't predict protein quantity well), and are easily thrown off by subtle differences in assay or bioinformatic processing.