Hello everyone,
I've just finished MACS2 narrow peak calling ATAC-seq data. With a cut-off q-value of 0.05 I have around 200K peaks per sample. My literature review suggests only the top 50,000 (some say 100,000) non-overlapping peaks are included in downstream analysis.
From the authors of the ATAC-seq protocol:
"Using the filtered peak set, peak summits were extended +/-250 bps. The top 50,000 non-overlapping 500bp summits, which we refer to as accessibility peaks were used for all downstream analysis."
Conceptually I get the reasoning, there is no need to have 1000s of peaks fall in the same 500bp window so remove the overlaps.
However, no authors state how they rank the top 100,000. Is it by -log10(qvalue) or is it by number of reads within the 500bp window? Does it make a difference which one I use?
It would be easier to use -log10(qvalue) as it is right there in the same narrowPeaks file with positions. I do realize I can be more strict with the q-value but I think that will not be enough to cut down to 100,000 peaks.
Thanks for your input
Kenneth
maybe you should merge many overlapping peak into one large peak. or you can ask the author of MACS2.
Yes Ben I will be doing that.... however a ranking is still required to decide which peak to choose to keep.