Question

Can we combine adjacent peaks in ATAC-Seq data

1

Entering edit mode

5.4 years ago

oriolebaltimore ▴ 190

Hi I just started to analyze ATAC-Seq data. I am not an expert and trying uderstand with data from NCBI and I am working on it.

I did not come across anywhere, about how to deal with peaks next to each other such as adjacent peaks with at least a gap of 100 bases.

In such case, is it okay to combine the peaks. What would be ideal distance between peaks to.either combine or not combine them.

Thanks

Adrian

sequencing • 3.6k views

ADD COMMENT • link updated 2.5 years ago by ATpoint 85k • written 5.4 years ago by oriolebaltimore ▴ 190

1

Entering edit mode

Thanks Charles for your suggestion.

ADD REPLY • link 5.3 years ago by oriolebaltimore ▴ 190

0

Entering edit mode

You are welcome - although this is a comment to the question, rather than my answer :)

ADD REPLY • link 5.3 years ago by Charles Warden 8.3k

0

Entering edit mode

Thanks Friederike.

For example, I see two different peaks in a 5kb Gene. A peak near 5'UTR and first exon and another peak near exon 7.

I am not sure if chromatin is open only at these two sites. For same samples, I see RNA expression is very high in test sample compared to control. I was wondering, can a Gene have just two ATAC peaks at these two exons and closed on rest of Gene and expression can be that high. I know, associating expression to ATAC is not that straightforward, however, I felt I missing some knowledge here such as A. Should we assume that chromatin is open for this Gene, based on these two peaks. In such case, can I combine peaks.

OR

B. Observing two peaks in test and not in control at two different sites and no evidence of peaks in other sites of genes, should be considered as open chromatin for the gene.

As ATAC-seq may not be that sensitive to show large peak over the entire gene

Thanks

ADD REPLY • link 5.4 years ago by oriolebaltimore ▴ 190

1

Entering edit mode

If you could attach a screenshot with coverage tracks, it becomes easier and meaningful to interpret the signal.

ADD REPLY • link 5.4 years ago by Chirag Nepal ★ 2.4k

0

Entering edit mode

In concur with Chirag plus I will say that drawing conclusions on how to treat all peaks based on a single observation is probably not the most robust way forward.

ADD REPLY • link 5.4 years ago by Friederike 9.0k

score 3 · Answer 1 · 2019-06-20

Peak calling software, such as MACS2 or HMMRATAC or Genrich have been developed to identify regions of enrichment and assign statistical metrics to their results. You can choose to believe their assignments or, as you describe, you can choose to not fully believe them and take additional steps to alter the results, such as merging peaks that are close to each other. There is no universal answer to your question (in fact, I would ask first why you would want to merge the peaks, i.e., what are you gaining for your downstream analyses?).

score 2 · Answer 2 · 2019-06-29

2

Entering edit mode

5.3 years ago

oriolebaltimore ▴ 190

After I posted the original question, I read NucleoATAC paper by Schep AN et al. In nucleoATAC, one of the step includes extending identifying region by "bedtools slop". I extended for 200bp. I get good results now.

Following are links posted previously which I missed.

Nucleoatac - extending broadPeak regions

https://github.com/GreenleafLab/NucleoATAC/issues/46

Thanks for comments.

ADD COMMENT • link 5.3 years ago by oriolebaltimore ▴ 190

0

Entering edit mode

thanks for sharing your insights!

ADD REPLY • link 5.3 years ago by Friederike 9.0k

score 0 · Answer 3 · 2019-07-01

You can use GenomicRanges in R to merge peaks (as either the union or intersection, depending upon what you are looking for).

For example, let's say you have a .bed file (without a header), and you can create a GenomicRanges object as follows:

gr1 = reduce(GRanges(Rle(input.table$V1),
        IRanges(start=input.table$V2, end=input.table$V3)))

In this situation, I don't think the reduce command is necessary (to merge overlapping rows), but it won't hurt.

If you wanted to then create the union of two peaks, you could use:

peakGr = union(gr1, gr2)

Of, if you wanted to create the intersection of two peaks, you could use:

peakGr = intersect(gr1, gr2)

You can then quantify counts with featureCounts, htseq-count, etc. Of, if you only have 2 samples, you can try comparing one sample as the reference for peak calling (for what is intended to be used for "input" samples).