Upward Sloping P Value Distribution in ATAC-seq Analysis
0
0
Entering edit mode
5 months ago
jbern • 0

While analyzing an ATAC-seq data set with DEseq2, I got the below histogram of P values when comparing the WT and KO conditions. Since I didn't get a uniform distribution or one with a higher frequency of low p values, I think some assumption of how DEseq2 does the analysis is being violated. Whatever issue I have doesn't seem to be across all of my data, however, since I also have accessibility at multiple differentiation stages for each genotype, and between those stages, I get a well behaved distribution with many small p values. I certainly expect the differences in accessibility to be larger between different differentiation stages than between the WT and KO at the same stage, but the main goal of my analysis is to identify the differences between the WT and KO at these stages. Based on other resources I've found online, I've tried several things to resolve this. I repeated the analysis with other packages, including edgeR and limma/Voom, tried to correct the P values with fdrtool, and tried removing less accessible peaks. None of these solutions made a big difference though. Are there better ways of doing the analysis?

Thank you!

enter image description here

ATAC-seq DEseq2 edgeR limma • 532 views
ADD COMMENT
1
Entering edit mode

Instead of abstract pvalue distributions you should add standard QC plots and metrics. For example, FRiPs, PCA; MA-plots between conditions. Something to actually "look" at the data.

ADD REPLY
1
Entering edit mode

Most likely you have a confounder (e.g., a batch effect), since that's the most common cause of this sort of distribution. As suggested by ATpoint, PCA or similar QC plots will be informative here. More likely than not you just left something out of your model (e.g., stage) and this is the confounder.

ADD REPLY
0
Entering edit mode

Below are the PCA plots generated from running the following code:

vsd <- vst(dds, blind=T)

vsd2 <- vsd
mat <- assay(vsd2)
mm <- model.matrix(~condition, colData(vsd2))
mat <- limma::removeBatchEffect(mat, batch=vsd2$replicate, design=mm)
assay(vsd2) <- mat

# Plot PCA without accounting for batch effect
plotPCA(vsd, intgroup=c("condition"))

# Plot PCA accounting for batch effect
plotPCA(vsd2, intgroup=c("condition"))

The design I'm using is ~replicate + condition where condition contains both the genotype and stage as in the PCA plot legend. I've also tried a couple other designs with genotype and stage separate and/or accounting for an interaction between the two and it hasn't really changed the resulting p values.

enter image description here

ADD REPLY
0
Entering edit mode

You'll really want to remove the limma::removeBatchEffect() line.

ADD REPLY

Login before adding your answer.

Traffic: 1541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6