Question

Choosing an FC Threshold

0

Entering edit mode

9 months ago

Netanel • 0

Hello,

I have conducted a transcriptomics experiment and performed edgeR analysis on my data. Due to the low number of samples and mostly small effect sizes between control and treatment groups, I have decided to filter relevant Differentially Expressed Genes (DEGs) based solely on Fold Change (FC). I have already conducted behavioral experiments and RT-PCR on some of the genes chosen solely based on FC, and obtained good results, supporting the analysis's ability to detect treatment significant genes. Now, I want to better choose an FC threshold to best cut off possible noise. I am considering using a 95% confidence interval on the FC of all my genes (around 16,000 after low reads filtering) to select only the genes whose fold change falls outside the 95% CI as DEGs.

For example, I would divide the data into increasing or decreasing FC categories and calculate the CI for each category. For this example, let's assume my FC CI is 2 ± 0.2. Therefore, for upregulated genes, I would only consider genes whose FC is higher than 2.2.

Alternatively, I would appreciate any other objective ways to determine the FC threshold.

Thank you!

statistics rna-seq • 441 views

ADD COMMENT • link updated 9 months ago by Gordon Smyth ★ 7.7k • written 9 months ago by Netanel • 0

score 2 · Answer 1 · 2024-03-27

The approach you are taking doesn't make any statistical sense. There is no possible CI for the fold-changes because they are not identically distributed or independent and they cannot be considered to be a sample from a common fold-change. Working will unlogged absolute fold-changes, as you seem to be doing, is especially strange.

Ranking by fold-change is a non-statistical method that will give false positives and will prioritize low count genes of little biological interest. The approach doesn't have any statistical basis so there is no objective principle by which to choose the threshold.

It would be better to go back to edgeR for an analysis that is statistically defensible and which will rank genes in a more meaningful way taking account of both fold-change and count sizes.