Question

MACS2 for multiple conditions in ATACseq analysis

0

Entering edit mode

7 months ago

QX ▴ 70

Hi all,

I have many datasets from ATAC-seq for multiple condition (replicates of 2) setting and no control specifically, where I want to do pair-wise for every two of the conditions.

For macs2 setting, can I call peaks from all these file by setting the -t (treatment) for all the files and left the -c (control) as empty and use the peaks output to perform futher analysis; or I really need to setting macs2 for each pairwise?

Best,

mac2 • 643 views

ADD COMMENT • link 7 months ago by QX ▴ 70

2

Entering edit mode

If you are expecting only narrow peaks or broad peaks in all your samples, you can call macs2 on all your files at the same time. Then you can overlap the peaks sets using bedtools for example to get a consensus peak list and do your pair-wise comparison using featureCounts or any other "count to features" method you like.

If you have samples where you expect narrow peaks and others where you expect broad peaks, I would run two different macs2 instances with the associated parameters (narrow, broad).

ADD REPLY • link 7 months ago by Bastien Hervé 6.1k

0

Entering edit mode

for now I only go for the narrowPeaks. Can you specifize more on which bedtools command for this task and how can the tool choose the consensus peak? for e.g. in my data, there are some rows that are similar in range but have more peaks:

chr     start   end     length  abs_summit      pileup  -log10(pvalue)  fold_enrichment -log10(qvalue)  name
1       826531  828140  1610    826808  1730    1328.65 14.7141 1326.59 peak_all_peak_1a
1       826531  828140  1610    827539  4563    5338.88 38.7957 5336.28 peak_all_peak_1b
1       826531  828140  1610    827986  698     292.385 5.94175 290.704 peak_all_peak_1c
1       831302  831572  271     831439  329     57.8574 2.80512 56.4633 peak_all_peak_2
1       832037  832505  469     832340  290     41.2257 2.47361 39.8811 peak_all_peak_3
1       844126  846414  2289    844345  297     44.053  2.53311 42.6991 peak_all_peak_4a
1       844126  846414  2289    844868  886     448.757 7.53982 446.986 peak_all_peak_4b
1       844126  846414  2289    845167  359     72.0202 3.06013 70.5917 peak_all_peak_4c
1       844126  846414  2289    845361  443     117.176 3.77416 115.666 peak_all_peak_4d
1       844126  846414  2289    845830  3984    4433.47 33.8739 4430.96 peak_all_peak_4e
1       844126  846414  2289    846298  248     25.8501 2.11659 24.5669 peak_all_peak_4f
1       849999  850607  609     850094  191     9.97327 1.63207 8.79431 peak_all_peak_5a
1       849999  850607  609     850432  256     28.5584 2.18459 27.2627 peak_all_peak_5b
1       858014  858264  251     858132  312     50.3483 2.66061 48.9751 peak_all_peak_6

and if you have time, do know know the different between narrowPeaks and broadPeaks, or any materials on this?

ADD REPLY • link 7 months ago by QX ▴ 70

2

Entering edit mode

You can use bedtools merge to create what I called a "consensus" peaks set. The tool does not "choose" any peak, it will aggregate peaks till it finds a gap. The sketch from bedtool merge home page is self explanatory. I would suggest you to have a cutoff on the qvalue in order not to aggregate on peaks which are potential false positive, to avoid aggregating your whole chromosome into one peak. Here is a similar question to what you are asking.

The peaks you are calling using macs2 are related to the biology of your dataset. If you are expecting only rather small open chromatin regions like transcription factors or some histone marks like H3K27ac, then it will be more suitable to call narrow peaks. You can see it as a magnifying glass to detect small events. On the other hand, other histone marks have really big opening chromatin regions where it might be better to zoom out to be able to call them properly. Similar question