extreme high log2FoldChange in DESeq2 result
2
1
Entering edit mode
17 months ago
Sara ▴ 30

Hi all,

I am working with the RNA-seq data on humans (24patients-20controls). I used DESeq2 to find differentially expressed genes.

here is the code that I used:

dds <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable,
                                  directory=folder,
                                  design=~Plate+RIN+Sex+Age+condition+PC2+PC1) #I used the principal component of combination of cell markers

colData(dds)$condition <- relevel(colData(dds)$condition, ref = "Control")

dds<- DESeq(dds) 

res_result<- results(dds, contrast = c("condition", "patints","Control"))

resFix <- res_result[!is.na(res_result$padj),]

resPlot <- as.data.frame(resFix)

when I am looking at resPlot I have some really weird high log2FoldChange, It doesn't not look normal to me. are there any ideas why it happened or how it can be solved?

baseMean    log2FoldChange  lfcSE          stat             pvalue          padj
66.90546888 26.99407224   5.461529744    4.942584497     7.71E-07   0.000331692

4.828502431 12.90644751    4.43998199   2.906869338  0.003650657    0.030229042

here is the MA plot:

enter image description here

Thanks in advance.

log2FoldChange DESeq2 RNA-seq DGEA • 1.2k views
ADD COMMENT
1
Entering edit mode

Because the size of the LFC only matters in proportion to its Standard Error.

In this case you have a LFC SE of 5.5, I'd want to be sure its not extremely variable because of very low counts or one or two extreme outlier samples. To get a better look, I'd do two things:

1) lfcShrink as ATpoint said - this will have the effect of controlling the SE.

2) Probably more important is to look at the actual count plots of the raw counts for this gene to see if you believe it, as described in the DESeq2 vignette.

One final point, sort of unrelated. Probably you should have condition as the last term in your model. Because you set the contrast specifically, you are OK, but as recommended here: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#theory the best practice is to put the test variable last for DESeq2. Other tests that use sums of squares in R also are sensitive to the order of model input.

Can I use DESeq2 to analyze paired samples?

Yes, you should use a multi-factor design which includes the sample information as a term in the design formula. This will account for differences between the samples while estimating the effect due to the condition. The condition of interest should go at the end of the design formula, e.g. ~ subject + condition.

ADD REPLY
3
Entering edit mode
17 months ago
ATpoint 85k

This has been asked many times before. It usually comes down to do prefiltering (for example filterByExpr from edgeR) and using lfcShrink (method ashr or apeglm) to get more reliable logFC estimates. Especially prefiltering I always do.

ADD COMMENT
3
Entering edit mode
17 months ago
Trivas ★ 1.8k

I'd probably simplify your design formula, but the default MA plot from DESeq2 might not have the same thresholds as you want for downstream analysis. Furthermore, definitely take advantage of the lfcShrink function.

ADD COMMENT

Login before adding your answer.

Traffic: 2156 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6