Question

Strangely too low P-value and Adjusted P-value(FDR) DESeq2 and edgeR

0

Entering edit mode

3.6 years ago

bsh • 0

My data is experimental data that has been overexpressed for a specific gene. Data samples are divided into 3 groups according to the over-expression time and each group has 3 samples. (total 9 samples)

I conducted DGE analysis on the control group and one case group with DESeq2.

cts <- read.table(args[1], sep="\t", row.names=1, header=T)
cts <- as.matrix(cts)
head(cts)
                  Control X1_C X2_C Case_9H X1_Case_9 X2_Case_9
ENSG00000223972.5       0    1    0          0         0         1
ENSG00000227232.5      62   47   75         57        42        68
ENSG00000278267.1       2    4    5          5         2        11
ENSG00000243485.5       0    0    0          0         0         0
ENSG00000284332.1       0    0    0          0         0         0
ENSG00000237613.2       0    0    0          0         0         0

coldata <- read.table(sampleType, row.names=1, header=T)
head(coldata)
          condition       type
Control1         0H paired-end
Control2         0H paired-end
Control          0H paried-end
Case_9h_1        9H paired-end
Case_9h_2        9H paired-end
Case_9h          9H paired-end


coldata <- coldata[,c("condition", "type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)

dds <- DESeqDataSetFromMatrix(countData=cts,
                              colData=coldata,
                              design=~condition)

dds <- DESeq(dds)

res <- results(dds) ## summary
head(results(dds, tidy=TRUE)) ## let's look at the results table

res <- res[order(res$padj),]
head(res)

write.table(res, 'DESeq2_Result.txt', sep='\t', quote=FALSE)

However, p-value and adjusted p-value in the result is strangely too low.

baseMean        log2FoldChange  lfcSE   stat    pvalue  padj
ENSG00000       18120.6524757428        4.873942088671  0.137165033459337       35.5334152279844        1.4>
ENSG00000      1417.19703253115        1.97749054397417        0.198076839379039       9.98345162500319   >
ENSG00000       1829.10309350802        2.37575360942566        0.26467604834409        8.97608085162691   >
ENSG000000      1999.49090282137        1.97881497597188        0.221819203256418       8.92084610764927   >
ENSG00000      475.332694913143        2.16930000064788        0.256296524497821       8.4640242582233 2.5>

I also conducted DGE analysis on the control group and one case group with edgeR. In edgeR results, the high pvalue and adjusted pvalue were obtained.

genes   baseMean    log2FoldChange  lfcSE   stat    pvalue  padj    
ENSG00000   18120.65248 4.873942089 0.137165033 35.53341523 1.50E-276   2.53E-272
ENSG00000   1417.197033 1.977490544 0.198076839 9.983451626 1.80E-23    1.52E-19
ENSG00000   1829.103094 2.375753609 0.264676048 8.976080853 2.81E-19    1.58E-15
ENSG000000  1999.490903 1.978814976 0.221819203 8.920846109 4.63E-19    1.95E-15
ENSG00000   475.3326949 2.169300001 0.256296524 8.464024259 2.58E-17    8.71E-14

As a result of the search, I saw that very significant p-value can be obtained from experimental data such as small number of samples or knock-out ....

Is it normal? If it's abnormal, I want to know how to solve this.. Also I wonder why this happens..

RNAseq DGE edgeR DESeq2 pvalue • 2.0k views

ADD COMMENT • link updated 2.8 years ago by vanottee • 0 • written 3.6 years ago by bsh • 0

0

Entering edit mode

Have you tried any QC methods? For example, a PCA plot to see if there are any transcripts which are outliers?

ADD REPLY • link 3.6 years ago by K.patel5 ▴ 150

0

Entering edit mode

Your use of "high p-value" to mean a number much smaller is confusing. I can't even see what the p-values are in DESeq, you've cutoff the numbers.

ADD REPLY • link 3.6 years ago by swbarnes2 15k

0

Entering edit mode

Hi bsh - did you ever come to a conclusion with this?

I have been using STAR alignment --> Stringtie assembly --> deseq2 DGE analysis, and also have abnormally low adj p values (some listed simply as 0). We've presented the data as a volcano plot and have repeatedly been asked how adj p value could really be so low, so something that stands out to others as well. Our design is simply 3 controls vs 3 mutants, so relatively straightforward analysis.

ADD REPLY • link 2.8 years ago by vanottee • 0