is higher logFC always good?
3
0
Entering edit mode
2.1 years ago
kng ▴ 40

After differential expression analysis, I was looking at my top list. I first sorted this list according to absolute logFC and then again with adj-P-value, Top 100 genes in these two lists are quite different. Why does that happen? Isn't higher logFC always the most differentially expressed gene?

limma-voom RNASeq logFC gene-expression DE-analysis • 1.6k views
ADD COMMENT
4
Entering edit mode
2.1 years ago
LChart 4.7k

Isn't higher logFC always the most differentially expressed gene?

The logFC you're observing is the estimate of the true log fold change; much like if you collect the heights of 5 males and 5 females, the average height difference is an estimate of the true (population) male/female height difference. There is an uncertainty of this estimate that has to do with how variable the samples themselves are, and the total number of samples (the "standard error"). The p-value relates to how many "standard error" steps the estimate of the difference is from 0.

In the case of gene expression, there are many genes with low level of expression and high levels of expression variance. These are analogous to small sample sizes; and it's very easy to "observe" a large estimate logFC within your sample, but still have that be 50% 25% or even 10% of the standard error, resulting in a large p-value.

What can be done is to threshold genes by p-value; and rank the resulting genes by logFC. Another possibility is to rank the genes by the lower bound of a confidence interval (e.g., absolute logFC estimate minus 1 standard error).

ADD COMMENT
2
Entering edit mode
2.1 years ago

There are genes coding for proteins with an extremely low copy number inside the cells (one or a few copies into each cell) that are playing eminent and significant roles. So, a high Fold Change is not the only feature to consider.

ADD COMMENT
1
Entering edit mode
2.1 years ago
Ram 44k

log2FC measures the magnitude of change (how different) and adj-P-val measures how confident the algorithm is with its statement on how different the expression is. If a reliable friend and a stranger both offered to help with a situation, whom would you trust? Not all your reliable friends might offer to help and a bunch of strangers might - that's why going with just one list is not useful - you need someone both capable/willing (high log2FC) and reliable (low adj-p-val). Similarly, you need genes that are sufficiently differentially expressed and whom the algorithm is confident enough labeling DE.

ADD COMMENT
0
Entering edit mode

Thank you Ram! Is that the case with p-Value and adjPValue? I know adj is the corrected p-Value but aren't those two lists supposed to be nearly the same?

ADD REPLY
0
Entering edit mode

adjPval takes precedence over raw p-value as it accounts for multiple testing. In my experience the lists should be similar but they may not be - it depends on the experiment design (replicates etc.). Go with adjPval and log2FC - the useful regions of a volcano plot. You may choose to include pVal if that gets you somewhere biologically, but be sure to mention that you made the filters more lenient and found something worth lowering your standards for.

ADD REPLY

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6