I am currently analysing ChipSeq data from 4 different proteins in order to build up some idea of correlations between and across the c. elegens genome. Essentially I want to see where each protein overlaps with the others and where.
So far I have called peaks on all of my data sets (which include biological and technical replicates) I am now browsing data before I start comparing to find correlations (overlaps, intersections etc).
Some of my data is quite noisy, and in order to get the best out of it I have run MACS2 on a relatively low pvalue threshold (5e-2) and then only taken peaks which are confirmed across technical and biological replicates, hoping to catch noise and wrongly called peaks at this step. It seems to have worked empirically and I am seeing sensible results. However, this is my first solo bioinformatics project and I just wanted to check to see if this was a sensible method.
Is anyone able to recommend a better method? Is my MACS2 cutoff prohibitively low? Can anyone point me to papers which details methods for this sort of thing? I bow to the greater knowledge and wisdom of this community. Many thanks.
Thanks for your comment. I am making my way through the papers you recommend.
Are there any studies on the advantages of using pvalue over qvalue? I think my methods will come under a fair amount of scrutiny and I'd love to have something solid to back it up.
From what I recall, the author of macs, Tao Liu, recommended the q value over the p value. You can join the macs mailing list and ask him directly about this. My guess is that the q value is more empirical as it's based on the number of false positives in the input control, while the p value is based on a model of the data which is probably too simple.