My data is experimental data that has been overexpressed for a specific gene. Data samples are divided into 3 groups according to the over-expression time and each group has 3 samples. (total 9 samples)
I conducted DGE analysis on the control group and one case group with DESeq2.
cts <- read.table(args[1], sep="\t", row.names=1, header=T)
cts <- as.matrix(cts)
head(cts)
Control X1_C X2_C Case_9H X1_Case_9 X2_Case_9
ENSG00000223972.5 0 1 0 0 0 1
ENSG00000227232.5 62 47 75 57 42 68
ENSG00000278267.1 2 4 5 5 2 11
ENSG00000243485.5 0 0 0 0 0 0
ENSG00000284332.1 0 0 0 0 0 0
ENSG00000237613.2 0 0 0 0 0 0
coldata <- read.table(sampleType, row.names=1, header=T)
head(coldata)
condition type
Control1 0H paired-end
Control2 0H paired-end
Control 0H paried-end
Case_9h_1 9H paired-end
Case_9h_2 9H paired-end
Case_9h 9H paired-end
coldata <- coldata[,c("condition", "type")]
coldata$condition <- factor(coldata$condition)
coldata$type <- factor(coldata$type)
dds <- DESeqDataSetFromMatrix(countData=cts,
colData=coldata,
design=~condition)
dds <- DESeq(dds)
res <- results(dds) ## summary
head(results(dds, tidy=TRUE)) ## let's look at the results table
res <- res[order(res$padj),]
head(res)
write.table(res, 'DESeq2_Result.txt', sep='\t', quote=FALSE)
However, p-value and adjusted p-value in the result is strangely too low.
baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000 18120.6524757428 4.873942088671 0.137165033459337 35.5334152279844 1.4>
ENSG00000 1417.19703253115 1.97749054397417 0.198076839379039 9.98345162500319 >
ENSG00000 1829.10309350802 2.37575360942566 0.26467604834409 8.97608085162691 >
ENSG000000 1999.49090282137 1.97881497597188 0.221819203256418 8.92084610764927 >
ENSG00000 475.332694913143 2.16930000064788 0.256296524497821 8.4640242582233 2.5>
I also conducted DGE analysis on the control group and one case group with edgeR. In edgeR results, the high pvalue and adjusted pvalue were obtained.
genes baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000 18120.65248 4.873942089 0.137165033 35.53341523 1.50E-276 2.53E-272
ENSG00000 1417.197033 1.977490544 0.198076839 9.983451626 1.80E-23 1.52E-19
ENSG00000 1829.103094 2.375753609 0.264676048 8.976080853 2.81E-19 1.58E-15
ENSG000000 1999.490903 1.978814976 0.221819203 8.920846109 4.63E-19 1.95E-15
ENSG00000 475.3326949 2.169300001 0.256296524 8.464024259 2.58E-17 8.71E-14
As a result of the search, I saw that very significant p-value can be obtained from experimental data such as small number of samples or knock-out ....
Is it normal? If it's abnormal, I want to know how to solve this.. Also I wonder why this happens..
Have you tried any QC methods? For example, a PCA plot to see if there are any transcripts which are outliers?
Your use of "high p-value" to mean a number much smaller is confusing. I can't even see what the p-values are in DESeq, you've cutoff the numbers.
Hi bsh - did you ever come to a conclusion with this?
I have been using STAR alignment --> Stringtie assembly --> deseq2 DGE analysis, and also have abnormally low adj p values (some listed simply as 0). We've presented the data as a volcano plot and have repeatedly been asked how adj p value could really be so low, so something that stands out to others as well. Our design is simply 3 controls vs 3 mutants, so relatively straightforward analysis.