Hi all,
I am working with the RNA-seq data on humans (24patients-20controls). I used DESeq2 to find differentially expressed genes.
here is the code that I used:
dds <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable,
directory=folder,
design=~Plate+RIN+Sex+Age+condition+PC2+PC1) #I used the principal component of combination of cell markers
colData(dds)$condition <- relevel(colData(dds)$condition, ref = "Control")
dds<- DESeq(dds)
res_result<- results(dds, contrast = c("condition", "patints","Control"))
resFix <- res_result[!is.na(res_result$padj),]
resPlot <- as.data.frame(resFix)
when I am looking at resPlot I have some really weird high log2FoldChange, It doesn't not look normal to me. are there any ideas why it happened or how it can be solved?
baseMean log2FoldChange lfcSE stat pvalue padj
66.90546888 26.99407224 5.461529744 4.942584497 7.71E-07 0.000331692
4.828502431 12.90644751 4.43998199 2.906869338 0.003650657 0.030229042
here is the MA plot:
Thanks in advance.
Because the size of the LFC only matters in proportion to its Standard Error.
In this case you have a LFC SE of 5.5, I'd want to be sure its not extremely variable because of very low counts or one or two extreme outlier samples. To get a better look, I'd do two things:
1) lfcShrink as ATpoint said - this will have the effect of controlling the SE.
2) Probably more important is to look at the actual count plots of the raw counts for this gene to see if you believe it, as described in the DESeq2 vignette.
One final point, sort of unrelated. Probably you should have condition as the last term in your model. Because you set the contrast specifically, you are OK, but as recommended here: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#theory the best practice is to put the test variable last for DESeq2. Other tests that use sums of squares in R also are sensitive to the order of model input.