Question

Upward Sloping P Value Distribution in ATAC-seq Analysis

0

Entering edit mode

5 months ago

jbern • 0

While analyzing an ATAC-seq data set with DEseq2, I got the below histogram of P values when comparing the WT and KO conditions. Since I didn't get a uniform distribution or one with a higher frequency of low p values, I think some assumption of how DEseq2 does the analysis is being violated. Whatever issue I have doesn't seem to be across all of my data, however, since I also have accessibility at multiple differentiation stages for each genotype, and between those stages, I get a well behaved distribution with many small p values. I certainly expect the differences in accessibility to be larger between different differentiation stages than between the WT and KO at the same stage, but the main goal of my analysis is to identify the differences between the WT and KO at these stages. Based on other resources I've found online, I've tried several things to resolve this. I repeated the analysis with other packages, including edgeR and limma/Voom, tried to correct the P values with fdrtool, and tried removing less accessible peaks. None of these solutions made a big difference though. Are there better ways of doing the analysis?

Thank you!

enter image description here

ATAC-seq DEseq2 edgeR limma • 496 views

ADD COMMENT • link updated 4 months ago by Devon Ryan 104k • written 5 months ago by jbern • 0

1

Entering edit mode

Instead of abstract pvalue distributions you should add standard QC plots and metrics. For example, FRiPs, PCA; MA-plots between conditions. Something to actually "look" at the data.

ADD REPLY • link 5 months ago by ATpoint 85k

1

Entering edit mode

Most likely you have a confounder (e.g., a batch effect), since that's the most common cause of this sort of distribution. As suggested by ATpoint, PCA or similar QC plots will be informative here. More likely than not you just left something out of your model (e.g., stage) and this is the confounder.

ADD REPLY • link 5 months ago by Devon Ryan 104k

0

Entering edit mode

Below are the PCA plots generated from running the following code:

vsd <- vst(dds, blind=T)

vsd2 <- vsd
mat <- assay(vsd2)
mm <- model.matrix(~condition, colData(vsd2))
mat <- limma::removeBatchEffect(mat, batch=vsd2$replicate, design=mm)
assay(vsd2) <- mat

# Plot PCA without accounting for batch effect
plotPCA(vsd, intgroup=c("condition"))

# Plot PCA accounting for batch effect
plotPCA(vsd2, intgroup=c("condition"))

The design I'm using is ~replicate + condition where condition contains both the genotype and stage as in the PCA plot legend. I've also tried a couple other designs with genotype and stage separate and/or accounting for an interaction between the two and it hasn't really changed the resulting p values.

enter image description here

ADD REPLY • link 5 months ago by jbern • 0

0

Entering edit mode

You'll really want to remove the limma::removeBatchEffect() line.

ADD REPLY • link 4 months ago by Devon Ryan 104k