Hi, all!
During differential expression analysis using DESeq2, I found odd results from it; some of genes with many missing data were detected as significant DEGs with very low pvalues as follow.
log2FoldChange pvalue padj Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Control1 Control2 Control3 Control4 Control5
Actc1 25.34 7.34E-72 4.45E-68 0.00 0.00 0.00 0.00 0.00 796.48 0.00 0.00 0.00 0.00 0.00 132.75 0.00
Sorry, it's not so easy to see hear, but you can see that only one sample out of 11 samples + controls actually has a count values which is unlikely detectable as a significant DEG. I observed many genes in this condition were detected as significant DEGs.
The code to obtain these results is like below,
conds <- c(rep("Sample",6),rep("Control",5))
res <- results(dds, contrast = c("conds", Sample, Control))
How come these genes have so low pvalues and padj so they are detected as DEGs?
Can somebody explain to me?
Thank you!
Can you show the output of
counts(dds[grep("Actc1"), rownames(dds),])
?Thank you, ATpoint! But I got error with your command.
I found a typo in the code,
Sorry for the typo. That is odd, only one sample here has counts whereas in your toplevel example there are also counts for control 4 so it does not really match. Are you sure that there is no parsing error and this p-value and fold change do not belong to a different gene?
Actual data have more samples in complex conditions, I might have made a mistake copying and pasting the data.
Sorry!
For clarity, I'm posting the actual values in my DESeq2 data. I still see the strangely too low pvalues.
Thank you!
Without full code and a complete description of the experiment (at least the colData) I cannot comment any further.
OK. Thank you for spending time for me!
I'll check the codes and data again to see if there's any mistakes.
After carefully checking the data, I found that I got different normalized count values depending on the samples.
I have 36 samples in 5 groups as follow (for simplicity, I changed the sample names to more easily distinguish them),
My goal is finding DEGs between
WT_old1
andWT_young
. When I extractedWT_old1
andWT_young
samples and ran DESeq2 only or when all samples were input together, the normalized count values ofActc1
were different.I'm not sure if you can see the data above, but the raw counts are the same, while the normalized counts are different.
Interestingly, however, the fold-change values are the same (only p-values were different).
Having different normalized count values depending on the input samples is understandable, but why am I getting so weird p-values?