Hi,
I have a fundamental question regarding gene expression analysis.
I’m working with a cell line that has undergone many generations, accumulating mutations and changes over time, making it very different from the original cells. I have RNA-Seq data for both the evolved cells and the original ones.
When analyzing the expression of specific genes, I see two possible approaches:
Absolute Expression Analysis – Normalizing gene expression (e.g., using RPKM or another method) and examining absolute expression levels in my current cells. This allows me to see which genes are expressed (e.g., above 2) and identify, for example, the top 100 most highly expressed genes.
Differential Expression Analysis (DEG) – Comparing my current cells to the original ones to determine fold changes in gene expression. For example, I could check whether a specific gene is expressed X times higher in my current cells compared to the original cells.
The issue I see with both methods is as follows:
Absolute Expression Analysis: This approach gives me a list of expressed genes, but what can I actually conclude from that? A gene with an expression level of 3 might have a significant biological effect, while another gene with an expression level of 4 might not. Also, is a given expression level sufficient to produce enough protein to impact its pathway in the cell? It feels like this method mainly answers whether a gene is expressed and whether it’s among the highest expressed in the cell.
DEG Analysis: A gene with a fold change of 3x compared to the original cells likely has some impact, but I don’t really care about the original cells. I haven’t worked with them for a year, and their past state isn’t relevant to my current research.
Often, I analyze the expression of a specific set of genes, such as those involved in the TCA cycle, or focus on a single gene of interest. Given these concerns, what would be the best approach to analyze gene expression in my current cells? How can I determine which genes are functionally relevant beyond just looking at absolute expression or fold change compared to the original cells?
Thanks you so much.
Its worth noting that what you call here "Absolute Expression" is not absolute expression. The statistic you get - TPM/RPKM etc is still relative. Its just relative to the total amount of RNA in the sample.
People often ask "What does TPM correspond to in terms of number of mRNAs per cell", and the truth is that this is impossible to answer. A |TPM of 1 tells you that of every million transcripts, 1 will be from that gene. But to know how many transcripts form that gene there are in a cell, you'd therefore need to know how many transcripts molecules in the cell in total, from all genes, and that is generally not known, and varies from cell type to cell type, condition to condition and even cell to cell within a population.
I don't think you will find a single person on this forum who will advocate for absolute expression analysis. In fact, almost the same question was asked recently and you may want to read that thread.