Question

ATAC-seq macs2 peak calling bampe mode extension shift

1

Entering edit mode

7.1 years ago

nanoide ▴ 120

Hi

So I have been calling peaks on ATAC-seq data using --nomodel and --shift -35 --extsize 75. I have been using-f BED on filtered and shifted (+4 -5 offset recommended) bed files coming from bam files (bedtools bamtobed). As my data is paired-end, I've read it would be a good practice using -f BAMPE. So I filter and shifted my bam files but I'm not sure about the parameters to use when using -f BAMPE.

According to the documentation --shift is going to be 0 because of using BAMPE. What about --nomodel and --extsize? They are not neccesarry either? I wonder if MACS2 is reading TLEN from the bam files and then an extension to be used as fragment size is no longer neccesary. And the bam should be sorted by name or macs2 can recognize mate pairs either way?

Hope somebody can comment on this because and I find myself a little bit confused interpreting this Thank you!

ATAC-seq ChIP-Seq macs peaks • 8.8k views

ADD COMMENT • link updated 7.1 years ago by Ricardo A. ▴ 90 • written 7.1 years ago by nanoide ▴ 120

score 1 · Answer 1 · 2018-06-09

1

Entering edit mode

7.1 years ago

igor 13k

According to the MACS developer:

If you followed original protocol for ATAC-Seq, you should get Paired-End reads. If so, I would suggest you just use "--format BAMPE" to let MACS2 pileup the whole fragments in general. But if you want to focus on looking for where the 'cutting sites' are, then '--nomodel --shift -100 --extsize 200' should work.

Source: https://github.com/taoliu/MACS/issues/145

You should also check this previous discussion where the issue is discussed at length: ATAC-seq peak calling with MACS

ADD COMMENT • link 7.1 years ago by igor 13k

0

Entering edit mode

Thank you for your answer.

So I get according to the developper the options recommended are just "--f BAMPE " or "--nomodel --f BAM --shift -100 --extsize 200" However, I have filtered my rads and I have nucleosome free reads, so I know the upper limit of my fragment size (150). Wouldn't be better in this case to use --shift -75 --extsize 150?

Thanks!

ADD REPLY • link 7.1 years ago by nanoide ▴ 120

score 1 · Answer 2 · 2018-06-09

1

Entering edit mode

7.1 years ago

Ricardo A. ▴ 90

Hi, josruirod,

The --nomodel flag will tell MACS2 to bypass the model building, which would calculate the shift/extsize parameters. Since you are asking it to use the fragment information from the paired reads, then it doesn't need to figure out a model as it already have the information from the BAM file itself. Remember that MACS2 was designed to look at ChIP-seq data, where we normally use single-end reads - there is no information on the fragment sizes readily available and this would have to be inferred them from the data.

I'm 90% sure you don't need to sort by names in order for BAMPE to work - it will use the TLEN from each fragment. You can do a quick test using by trying to call peaks on the first 10,000 lines of your BAM file using samtools view -h file.bam | head -10000 | samtools view -b > file_subsampled.bam.

I haven't extensively tested the BAMPE option, but I found similar (and possibly noisier) results compared to the running MACS2 without BAMPE. Where did you get these shift -35 / extend 75 parameters from? We've been doing a lot of ATAC-seq data generation and processing over the last years, and we've been using shift -100 --extsize 200 based on the ATAC-seq fragment size distribution. You can have a look at https://github.com/ParkerLab/ATACseq-Snakemake for details on how we process our data. Also, look at https://github.com/ParkerLab/ataqv for an interactive diagnostic tool for ATAC-seq quality control.

Hope this helps.

ADD COMMENT • link 7.1 years ago by Ricardo A. ▴ 90

0

Entering edit mode

Thank you for your comprehensive answer

I make sure to make some tests according to your suggestions. Regarding --shift -35 --extsize 75, we used as extsize half the fragment sizes (we filtered for a maximum size of 150 bp) I have also read about macs2 predictd in Advanced peak calling. So if I got it rigth, if you have the whole fragment size distribution shift -100 --extsize 200 would be better but in case you have filtered reads per size, it would be a good idea to decrease those numbers, right?

Thanks for your time!

ADD REPLY • link 7.1 years ago by nanoide ▴ 120

1

Entering edit mode

I think it makes sense to use smaller shift and extension in this case. Why are you only looking at small fragments? There is a lot of information on larger mono and dinucleosome-sized fragments.

ADD REPLY • link 7.1 years ago by Ricardo A. ▴ 90

0

Entering edit mode

It was mainly a resolution issue, according to the fragment size distribution not a good amount of fragments had higher sizes, ir order to assess nucleoomes. So we wanted to focus in nucleosome free fragments in order to predict open regions and Transcription Factors Binding Sites.

Thanks

ADD REPLY • link 7.1 years ago by nanoide ▴ 120