Question

Is it appropriate to resize peak before ATAC/ChIP-Seq differential peak analysis

1

Entering edit mode

4.0 years ago

shangguandong1996 ▴ 30

Hi, everyone

I am always confused whether we should resize peak before ATAC/ChIP-Seq differential peak analysis (DA). I have see some DA tools or DA review or paper. Some use the resize while some do not. But it seems that it do not say the advantages and disadvantages. Now I am using DiffBind to DA. It recommend resize peak when dealing with broad peak, like H3K27me3.

For me, I think we should not resize the peak, after all, we do not resize gene when do RNA DA. But if we do not resize, it will introduce a lot of noise when dealing with motif analysis.

So I am wondering whether some one can give me addvices.

Best wishes

Guandong Shang

ATAC • 1.6k views

ADD COMMENT • link updated 4.0 years ago by Lila M ★ 1.3k • written 4.0 years ago by shangguandong1996 ▴ 30

0

Entering edit mode

Just like with RNA Seq analysis, you should keep in mind that there may be a size bias. All other things being equal longer features may have an easier time collecting reads than shorter features, but this doesn't prevent people from doing RNA Seq, and if you examine your results with this awareness, I don't see how it would matter much.

But if we do not resize, it will introduce a lot of noise when dealing with motif analysis.

You can always resize when you do your motif analysis. Native peaks for quantitative analysis, resized features (i.e. peak centers +/- some flank) for motif analysis.

ADD REPLY • link 4.0 years ago by seidel 11k

1

Entering edit mode

Just like with RNA Seq analysis

I think there is an important difference between RNAseq and ATAC/ChipSeq. In RNAseq genes are given in advance and the experimental data is used only for quantification and differential expression. In ATAC/ChIPseq the experimental data is used both for defining the features of interest and for quantifying them. I think this causes distortions especially when one experimental group has more peaks than another or some replicates worked better or have larger library sizes. I cannot quite formalise the argument but I think in ATAC/ChIP you use the data twice, once for discovery, once for quantification and this is not ideal from a statistical point of view. (Sorry if this was a bit off-topic...)

ADD REPLY • link 4.0 years ago by dariober 15k

0

Entering edit mode

That's OK. I am very greatful that some one can talk about this :).

ADD REPLY • link 4.0 years ago by shangguandong1996 ▴ 30

0

Entering edit mode

Thanks for your reply:)

But If I want to do motif enrichment analysis, how can I deal with native peak and resize? The detailed analysis can be like:

k-means for peak, and I get about 10 cluster.
I want to know which motif are enriched in cluster 1 compared with other cluster
I scan motif in all peak, and calculate enrichment score for each motif using some ways, like fishesr test

And the point is should I use the resize or naive peak count for kmeans ? In order to avoid including noise when Kmeans，I always use DA result to pick peak I use for k-means. If I use the naive peak to do DA, and pick peak according to lfc and p-value then do k-means. how I can do scan motif for these navie peak ? should I resize these naive peak, then scan motif ?

Best wishes

Guandong Shang

ADD REPLY • link 4.0 years ago by shangguandong1996 ▴ 30

0

Entering edit mode

If you are following a pipeline or package that recommends that, I will do it, just because the the statistic behind the program may needs it for the best. RNAseq and ATAC/chipSeq are different approach so do not extract conclusions from one to other in therms of analysis.. However, if finally you will decide not to follow developer's recommendation you can carry on with it and if the downstream analysis demonstrates a problem then you can backtrack and consider it again.

ADD REPLY • link 4.0 years ago by Lila M ★ 1.3k