Absolute vs. differential gene expression - which analysis is more informative?
1
1
Entering edit mode
1 day ago
bioinfo1990 ▴ 10

Hi,

I have a fundamental question regarding gene expression analysis.

I’m working with a cell line that has undergone many generations, accumulating mutations and changes over time, making it very different from the original cells. I have RNA-Seq data for both the evolved cells and the original ones.

When analyzing the expression of specific genes, I see two possible approaches:

Absolute Expression Analysis – Normalizing gene expression (e.g., using RPKM or another method) and examining absolute expression levels in my current cells. This allows me to see which genes are expressed (e.g., above 2) and identify, for example, the top 100 most highly expressed genes.

Differential Expression Analysis (DEG) – Comparing my current cells to the original ones to determine fold changes in gene expression. For example, I could check whether a specific gene is expressed X times higher in my current cells compared to the original cells.

The issue I see with both methods is as follows:

Absolute Expression Analysis: This approach gives me a list of expressed genes, but what can I actually conclude from that? A gene with an expression level of 3 might have a significant biological effect, while another gene with an expression level of 4 might not. Also, is a given expression level sufficient to produce enough protein to impact its pathway in the cell? It feels like this method mainly answers whether a gene is expressed and whether it’s among the highest expressed in the cell.

DEG Analysis: A gene with a fold change of 3x compared to the original cells likely has some impact, but I don’t really care about the original cells. I haven’t worked with them for a year, and their past state isn’t relevant to my current research.

Often, I analyze the expression of a specific set of genes, such as those involved in the TCA cycle, or focus on a single gene of interest. Given these concerns, what would be the best approach to analyze gene expression in my current cells? How can I determine which genes are functionally relevant beyond just looking at absolute expression or fold change compared to the original cells?

Thanks you so much.

Differential-gene-expression Gene-expression RNA-Seq • 288 views
ADD COMMENT
1
Entering edit mode

Its worth noting that what you call here "Absolute Expression" is not absolute expression. The statistic you get - TPM/RPKM etc is still relative. Its just relative to the total amount of RNA in the sample.

People often ask "What does TPM correspond to in terms of number of mRNAs per cell", and the truth is that this is impossible to answer. A |TPM of 1 tells you that of every million transcripts, 1 will be from that gene. But to know how many transcripts form that gene there are in a cell, you'd therefore need to know how many transcripts molecules in the cell in total, from all genes, and that is generally not known, and varies from cell type to cell type, condition to condition and even cell to cell within a population.

ADD REPLY
0
Entering edit mode

I don't think you will find a single person on this forum who will advocate for absolute expression analysis. In fact, almost the same question was asked recently and you may want to read that thread.

ADD REPLY
1
Entering edit mode
1 day ago
LauferVA 4.6k

One tends to use absolute expression metrics for quality control and to guarantee statistical power - i.e., to ensure that only genes with sufficient expression are considered. One tends to apply differential expression analysis to detect significant regulatory shifts (even if the baseline is, as in this case, historical).


Why? Return to the assay itself. Exactly what is RNA sequencing? How, exactly, is it performed? Understand this and we will understand why subtle variations in a step of the RNA‑Seq workflow mean significant differences in absolute expression measurements:

1. Sample Handling and RNA Extraction: Even minor inconsistencies in how samples are collected, stored, or processed can reduce RNA quality and yield, setting the stage for skewed expression values.

2. Reverse Transcription Efficiency: Small differences in the efficiency of converting RNA to cDNA can lead to inconsistent representation of transcripts, affecting downstream quantification.

3. Library Preparation (Fragmentation, Adapter Ligation, PCR Amplification): Variability during fragmentation or adapter ligation may unevenly capture transcripts, while slight biases during PCR amplification can disproportionately boost certain fragments, all of which can distort the true abundance of transcripts.

4. Sequencing Platform and Batch Effects: Minor variations in sequencing runs or machine performance can further amplify these initial discrepancies, leading to large differences in the reported absolute expression levels.


The use and popularity of differential expression analysis is best understood as one of many techniques needed to get usable information about two conditions in a setting of such high variability, and not even the most important one ...

Again, the idea is that while the variability that can emerge from these steps is a challenge, if you:

A) Rigorously design an experiment controlled for everything but the condition

B) Process both samples the same

C) Use QC procedures to "deal with" differences in RNA integrity, library complexity, read quality, and so forth.

D) Additionally perform sophisticated within and between sample normalization procedures (like those found in DESeq or EdgeR)

E) Deal with or remove batch effects, remove outliers, and control for covariates during statistical testing

F) Analyze Differential Expression judiciously

you may be able to draw reasonable conclusions about gene expression differences between two conditions. But you aren't going to get there through absolute metrics.


In summary:

  • the sensitivity of the process of sequencing itself to subtle conditions essentially precludes the use of of absolute expression metrics for anything but basic QC.
  • some use them also to cut down on multiple testing penalties by removing before hand any genes whose counts are too low to offer hope of a reliable statistical comparison.

Even then, keep in mind RNA-seq results are incomplete, tend not to correlate with other metrics of interest reliably (e.g., RNA-seq doesn't predict protein quantity well), and are easily thrown off by subtle differences in assay or bioinformatic processing.

ADD COMMENT
0
Entering edit mode

Thank you for the detailed response!!! These are indeed important points to keep in mind.

I'm still wondering what I can do with my data. My main goal is to better understand the cell line I currently have and explore ways to optimize it. For example, if I notice that certain genes in the TCA cycle are not being expressed, perhaps I should consider adding something to enhance their expression.

Of course, one approach would be to design experiments where I actively try to modulate the TCA cycle and then sequence the cells under different conditions to compare the outcomes. However, my lab doesn’t have the capacity to perform high-frequency sequencing.

Instead, what we’ve been doing is sequencing the cells and then going back to the data whenever we suspect something and want to investigate further—checking if certain pathways or genes are expressed. (Though, of course, gene expression doesn’t necessarily indicate how much protein is being produced.)

In my case, should I assume that my historical reference cells represent an ideal baseline where everything functioned optimally? Should I always perform differential gene expression (DGE) analysis against them? If a gene set is expressed at a higher level compared to the reference cells, does that indicate sufficient expression in my current cells? And if it’s lower, does that suggest the genes are not being sufficiently activated, meaning I might need to intervene?

Naturally, cellular activity is far more complex than this, but I’m looking for clues that could guide experimental decisions in the lab.

ADD REPLY
0
Entering edit mode

In my opinion, these questions cannot be answered meaningfully without knowing more about your scientific goals. take for example your question about TCA genes.

You've observed certain TCA genes are not being expressed, and are asking if you should do anything about this - but the reader knows neither your cell line nor your goals. The scenario you have described is entirely expected for most human cells - that much I can tell you ... http://biogps.org/#goto=genereport&id=4967 - OGDH; this should be low to undetectable in most tissue, but if its undetectable in heart you've got a problem. But other TCA genes ? Different story. http://biogps.org/#goto=genereport&id=1431; http://biogps.org/#goto=genereport&id=2271.

Even more importantly, without knowing literally anything about what you are studying, it is not appropriate for anyone to speculate on whether you should add something to enhance their expression. The answer to that entirely depends on what you are trying to study/model/etc.

ADD REPLY

Login before adding your answer.

Traffic: 2377 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6