Question

Decision on Upregulated/Downregulated Genes in DE list- P-value and Log fold change

0

Entering edit mode

7 weeks ago

odi ▴ 10

I have performed Differential expression testing using FindMarkers in Seurat in R. I was hoping to find out which genes are upregulated in the mutant vs wild type and vice versa.

First dilemma i am having is what log fold change to use as my cut off. Initially, the plan was to use a log fold change of greater than or less than 1 so i am looking for genes that had a two times change (2^1 = 2). But then my PI preferred we pick a gene of interest and make our cut off there for the downregulated list but the upregulated list would still be LFC > 1.

Is this a valid take? I am worried that the inconsistency in the choices will have people questioning my research.

Second dilemma i am having is the p-value. I am used to choosing a p-value of less than 0.05 to base statistical significance as other researchers would do. However, my PI is complaining that the genes are too many and so for the downregulated list, he wants to use the p adjusted value and then the upregulated the p-value. Again, is this valid? Wouldn't the inconsistency in choices cause questioning? What is the difference between p-value and p-adjusted value and which is best to use?

Pvalue logfoldchange • 703 views

ADD COMMENT • link updated 6 weeks ago by Mensur Dlakic ★ 28k • written 7 weeks ago by odi ▴ 10

score 2 · Answer 1 · 2024-10-03

2

Entering edit mode

7 weeks ago

i.sudbery 20k

In terms of the logfoldchange, this is always a judgement call about biology. I wouldn't say there are any right or wrong answers and you can set threshold where you think is reasonable. If you have a good, biologically based reason for setting the up-regulation and down-regulation thershold differently then that is fine.

In terms of the p-value, using the nominal, unadjusted p-value is always wrong. Never use unadjusted p-values, always use the adjusted one. The adjusted p-value accounts for the fact that you are carrying out many tests (one per gene). With a 5% p-value threshold, there is a 5% chance that you will call a gene differentially expressed when it isn't. If you test 20,000 genes, then you will be wrong in 1,000 cases (5% of 20,000). The adjusted p-value accounts for this, so instead of being wrong in 5% of all cases, you will be wrong in 5% of the cases you call as differentially expressed (i.e. it estimates the "False Discovery Rate").

ADD COMMENT • link 7 weeks ago by i.sudbery 20k

0

Entering edit mode

I thank you for taking your time to provide your expertise.

ADD REPLY • link 6 weeks ago by odi ▴ 10

0

Entering edit mode

In terms of the logfoldchange, this is always a judgement call about biology. I wouldn't say there are any right or wrong answers and you can set threshold where you think is reasonable. If you have a good, biologically based reason for setting the up-regulation and down-regulation thershold differently then that is fine.

I am going to provide a somewhat different point of view, which can be summarized by my response to this paragraph: within reason. Nobody will argue if you pick abs(logFC) >=1. You will get an argument if you pick abs(logFC)<0.5, no matter what your biological reasoning is.

ADD REPLY • link 6 weeks ago by Mensur Dlakic ★ 28k

1

Entering edit mode

Requiring abs(logFC)<0.5 is actaully very useful, particularly if coupled with interval null hypothesis look for no change (altHypothesis="lessAbs" in DESeq).

Or do you mean that you shouldn't have, e.g. abs(logFC)>=0.1? I agree that there are some people that would argue, but I'd argue thsi can be apprioate in some cases, particularly when you have a lot of samples. With enough samples the null hypohesis logFC==0 is always wrong, and will always be rejected for some sufficiently large n. Requiring abs(logFC)>=0.1 is effectively saying, my null hypotehsis is that logFC is not approximately 0, rather than exactly 0. It can also be directly useful is you are, for example, studying ultrasensitive bistable switches.

ADD REPLY • link 6 weeks ago by i.sudbery 20k

0

Entering edit mode

My main point was that instead of you can set threshold where you think is reasonable it should be you can set threshold where you think is reasonable AND one that reviewers will accept.

There are good reasons to select thresholds smaller than abs(logFC)=1, but among those I wouldn't count: 1) I don't have enough differentially expressed genes, so I will lower the threshold; 2) I found an interesting protein at logFC=0.7 that I am convinced should be differentially expressed, so now I will pretend that 0.7 was a good threshold all along.

ADD REPLY • link 6 weeks ago by Mensur Dlakic ★ 28k