Question

How does MACS algorithm work on ATAC-seq data if there is no control sample and no model estimation?

1

Entering edit mode

7.2 years ago

salamandra ▴ 550

I understand that unlike for ChIP-seq, with ATAC-seq MACS doesn't estimate the model, which means it doesn't determine distance 'd' between the tags and therefore it doesn't shift the tags by d/2 in 3´direction (and that is why with ATAC-seq the --nomodel --shift 0 parameters are set).

But if 'd' is not estimated then how can MACS slide a window of 2d across the genome to calculate peak enrichment?

Also, for ChIP-seq there is a control sample (input DNA or DNA pulled down with an unspecific antibody) that is used to calculate lambda local and determine peak enrichment. In ATAC-seq there isn't any control pulled down with unspecific antibody. What is that control in ATAC-seq?

ChIP-Seq MACS • 6.2k views

ADD COMMENT • link updated 6.2 years ago by jihed.chouaref • 0 • written 7.2 years ago by salamandra ▴ 550

0

Entering edit mode

Thanks for the discussion I am going through the same question and as a non-bioinformatician, this is much appreciated!

ADD REPLY • link 6.2 years ago by jihed.chouaref • 0

0

Entering edit mode

7.2 years ago

BioinfGuru ★ 2.1k

Have you seen these posts? 1) biostars1 2) biostars2 3) MACSgithub

ADD COMMENT • link 7.2 years ago by BioinfGuru ★ 2.1k

0

Entering edit mode

i have, and none of them seem to answer my questions, but as i'm no expert on bioinformatics is possible i am just not understanding what they are saying.

ADD REPLY • link 7.2 years ago by salamandra ▴ 550

score 6 · Accepted Answer · 2017-11-11

6

Entering edit mode

7.2 years ago

ATpoint 86k

MACS (for ChIP or ATAC) does not necessarily need a control. There is actually a section in the paper describing exactly the situation where no control is available. In this case, it determines the local lambda in certain windows. The paragraph starts with:

Therefore, instead of using a uniform λBG, estimated from the whole genome, MACS uses a dynamic parameter, λlocal, defined for each candidate peak as (...)

In the absence of a control, naively one would estimate the background level of the experiment in a uniform fashion. That means if you have, say 30mio reads and you randomly threw them onto the genome, then every base (given a certain fragment length of the library) would have a coverage of x. Still, the genome coverage is never uniform (you will most impressively see once you analyze your first WGS sample, it is really a hilly landscape) due to differences in chromatin structure, PCR/GC bias, local copy number alterations etc. So instead of a genome-wide λBG, MACS checks the vicinity of the peak centers (up to 10kb) to estimate how prone this genomic region is to accumulate reads. In my understanding, as a typical ChIPseq experiment produces sharp peaks, the local environment should be depleted for enriched signals. Therefore, notable readcounts in the vicinity are an indication of a local bias. As a result, the peak enrichment needs to be penalized (down-corrected), as the region itself is prone to accumulate enrichment, irrespective of the protein target.

ADD COMMENT • link 7.2 years ago by ATpoint 86k

0

Entering edit mode

Hi, I have some questions about your explanation. As you said, when there is no control, the paragraph said it would use a dynamic parameter. But according to the paper, the whole sentence is like this:

"" For example, at the FoxA1 candidate peak locations, tag counts are well correlated between ChIP and control samples (Figure 1c,d). Many possible sources for these biases include local chromatin structure, DNA amplification and sequencing bias, and genome copy number variation. Therefore, instead of using a uniform λ BG estimated from the whole genome, MACS uses a dynamic parameter, λ local , defined for each candidate peak as: ""

Does it mean that λ local works for eliminating the influence of local biases? Furthermore, in the following passage, there is another sentence as below:

'''' where λ 1k , λ 5k and λ 10k are λ estimated from the 1 kb, 5 kb or 10 kb window centered at the peak location in the control sample, or the ChIP-Seq sample when a control sample is not available (in which case λ 1k is not used). ''''

In this case, I don't think λ local is actually work for a situation where no control is available.

By the way, I am not quite understand what the control really is in the ATAC-seq analysis and whether a control is necessary in such condition. I wonder whether the data with nucleosome signal can work as a control, since those nucleosome free regions would probably not be detected in these data (As the Figure3A in Buenrostro J D, et al. Nature methods, 2013, 10(12): 1213-1218).

ADD REPLY • link 7.1 years ago by ghostforever.shi ▴ 50

1

Entering edit mode

Does it mean that λ local works for eliminating the influence of local biases? Furthermore, in the following passage, there is another sentence as below:

I would say it tries to estimate and corrects for the bias.

where λ 1k , λ 5k and λ 10k are λ estimated from the 1 kb, 5 kb or 10 kb window centered at the peak location in the control sample, or the ChIP-Seq sample when a control sample is not available (in which case λ 1k is not used).

Without a control, only the 5kb and 10kb are used.

By the way, I am not quite understand what the control really is in the ATAC-seq analysis and whether a control is necessary in such condition. I wonder whether the data with nucleosome signal can work as a control, since those nucleosome free regions would probably not be detected in these data (As the Figure3A in Buenrostro J D, et al. Nature methods, 2013, 10(12): 1213-1218).

In ATAC-seq, you do not have a control. Both nucleosomal and nucleosome free signals are located in open chromatin. Do not mistake open chromatin with nucleosome free DNA. Open chromatin is a combination of distinctly positioned nucleosomes which flank nucleosome free DNA. ATAC-seq peaks contain both nucleosomal and nucleosome free signals.

ADD REPLY • link 7.1 years ago by ATpoint 86k

0

Entering edit mode

Oh, I got you, Much appreciate. However, as you know, Tn5 is an enzyme. Without a control, how can we correct the bias caused by the enzyme itself? This problem really confused me a lot.

ADD REPLY • link 7.1 years ago by ghostforever.shi ▴ 50

0

Entering edit mode

Double-check the early ATAC papers (maybe the original one and the one on the NucleoATAC software). They show that the intrinsic cutting preference/bias of the transposon is minimal if I remember correctly. The only proper control would probably be to use the Tn on "naked" genomic DNA. But then you need quiet many reads to get a proper signal due to the size of mammalian genomes, so the cost/effect ratio is simply not economic so nobody routinely does it (the last sentence is only thinking aloud^^).

ADD REPLY • link 7.1 years ago by ATpoint 86k

0

Entering edit mode

Yeap, it appears in the original one, I miss that. >_<|||

Thanks a lot!

ADD REPLY • link 7.1 years ago by ghostforever.shi ▴ 50