Question

High adj. p-value/FDR despite large FC and high expression levels

0

Entering edit mode

23 days ago

nik.kraemer ▴ 10

Dear all,

I am trying to analyze the expression levels of viral genes (let's call them A, B, and C) in two different HEK293 cell lines (cell line 1 and 2) after transfection. After extracting the read counts with htseq-count and generating a count matrix, I used DESeq2. However, although read counts are high and consistent among replicates, DESeq does not identify as being differentially regulated. Of course, that is absolutely possible, but I want to check if I am missing something here. DESeq2 indicates a log2FC of ~2 (which seems correct), with an adj. p-value of ~0.19.

Based on this post at Bioconductor, which kind of went into the same direction, I also tried edgeR robust. This has led to the same result essentially (so at least results are robust hehe).

Current version of DESeq2 is 1.42.1, that of edgeR is 4.0.16

Any insights are much appreciated! I also would be happy to share further info, if necessary.

Thank you very much

These are normalized counts for the mentioned genes.

edgeR DESeq2 RNA-Seq • 347 views

ADD COMMENT • link updated 23 days ago by yura.grabovska ▴ 700 • written 23 days ago by nik.kraemer ▴ 10

0

Entering edit mode

To add to this, this is my DESeq2 script I am following:

dds <- DESeqDataSetFromMatrix(countData = allcts,
                          colData = anno,
                          design = ~ cell + treatment + cell:treatment)
smallestGroupSize <- 3
keep <- rowSums(counts(dds) >= 10) >= smallestGroupSize
dds <- dds[keep,]

res <- results(dds,
             pAdjustMethod = "BH",
             alpha = 0.05,
             cooksCutoff = FALSE) 

#cooksCutoff set to false to not accidentally cut off viral genes with high variability to untreated samples

This is the output, filtered for the respective genes:

 X baseMean log2FoldChange    lfcSE     stat     pvalue      padj

A 257712.9       2.042526 1.095646 1.864221 0.06229064 0.1187003

B 302607.8       2.029994 1.112316 1.825016 0.06799867 0.1276736

 C 354352.3       2.036673 1.140450 1.785850 0.07412349 0.1372061

ADD REPLY • link 23 days ago by nik.kraemer ▴ 10

0

Entering edit mode

My guess is that data are so variable that variance is super high. Can you show plotPCA? You can try to filter more, for example filterByExpr, because your filter here essentially just removes genes that are zero everywhere, also see vignette. That reduces multiple testing burden.

ADD REPLY • link 23 days ago by ATpoint 87k

score 0 · Answer 1 · 2025-01-29

Your lfc is relatively high but your lfcSE is also quite high which means that the statistic used to calculate your p-values is quite low.

stat = log2FoldChange / lfcSE

Your p-value is derived from the stat by comparing with a normal distribution table and seeing how likely it is to obtain a value as extreme as yours.

Let's take gene A as an example. The probability for a stat value of around 1.86 is 0.9686 based on a normal distribution table, so with that:

1 - 0.9686 = 0.0314

p-value = 2 × 0.0314 = 0.0628

This is more or less equivalent with the p-value you receive accounting for rounding errors.