I am running an ATAC-seq experiment (using MACS2 peak caller) and we are combining a pool-seq set of outlier and non-outlier SNPs, to see how many of these SNPs lie within peak regions. We assess this by running intersectBed over 1) our ATAC-seq bed file and 2) our pool-seq bed file to create an output text file that we run in RStudio.
We are analyzing three tissues. Each of these tissues has shown a p-value of much less than 0.05 in our Chi-squared test. However, this is NOT true for our combined-tissues bed file. Our p-value for the "whole set" is around 0.35. I've checked and re-checked, and this seems to be accurate.
I wanted to see if anybody could help me make sense of this. How can our subsets of data be statistically significant, while our combined data file is not? What is going on there?
Additionally, I ran the intersectBed command on each individual file to create a consensus intersected datafile, and this was found to be significant (p<0.000001).
Any insights would be very appreciated.