Is it appropriate to resize peak before ATAC/ChIP-Seq differential peak analysis
0
1
Entering edit mode
3.4 years ago

Hi, everyone

I am always confused whether we should resize peak before ATAC/ChIP-Seq differential peak analysis (DA). I have see some DA tools or DA review or paper. Some use the resize while some do not. But it seems that it do not say the advantages and disadvantages. Now I am using DiffBind to DA. It recommend resize peak when dealing with broad peak, like H3K27me3.

For me, I think we should not resize the peak, after all, we do not resize gene when do RNA DA. But if we do not resize, it will introduce a lot of noise when dealing with motif analysis.

So I am wondering whether some one can give me addvices.

Best wishes

Guandong Shang

ATAC • 1.3k views
ADD COMMENT
0
Entering edit mode

Just like with RNA Seq analysis, you should keep in mind that there may be a size bias. All other things being equal longer features may have an easier time collecting reads than shorter features, but this doesn't prevent people from doing RNA Seq, and if you examine your results with this awareness, I don't see how it would matter much.

But if we do not resize, it will introduce a lot of noise when dealing with motif analysis.

You can always resize when you do your motif analysis. Native peaks for quantitative analysis, resized features (i.e. peak centers +/- some flank) for motif analysis.

ADD REPLY
1
Entering edit mode

Just like with RNA Seq analysis

I think there is an important difference between RNAseq and ATAC/ChipSeq. In RNAseq genes are given in advance and the experimental data is used only for quantification and differential expression. In ATAC/ChIPseq the experimental data is used both for defining the features of interest and for quantifying them. I think this causes distortions especially when one experimental group has more peaks than another or some replicates worked better or have larger library sizes. I cannot quite formalise the argument but I think in ATAC/ChIP you use the data twice, once for discovery, once for quantification and this is not ideal from a statistical point of view. (Sorry if this was a bit off-topic...)

ADD REPLY
0
Entering edit mode

That's OK. I am very greatful that some one can talk about this :).

ADD REPLY
0
Entering edit mode

Thanks for your reply:)

But If I want to do motif enrichment analysis, how can I deal with native peak and resize? The detailed analysis can be like:

  1. k-means for peak, and I get about 10 cluster.
  2. I want to know which motif are enriched in cluster 1 compared with other cluster
  3. I scan motif in all peak, and calculate enrichment score for each motif using some ways, like fishesr test

And the point is should I use the resize or naive peak count for kmeans ? In order to avoid including noise when Kmeans,I always use DA result to pick peak I use for k-means. If I use the naive peak to do DA, and pick peak according to lfc and p-value then do k-means. how I can do scan motif for these navie peak ? should I resize these naive peak, then scan motif ?

Best wishes

Guandong Shang

ADD REPLY
0
Entering edit mode

If you are following a pipeline or package that recommends that, I will do it, just because the the statistic behind the program may needs it for the best. RNAseq and ATAC/chipSeq are different approach so do not extract conclusions from one to other in therms of analysis.. However, if finally you will decide not to follow developer's recommendation you can carry on with it and if the downstream analysis demonstrates a problem then you can backtrack and consider it again.

ADD REPLY

Login before adding your answer.

Traffic: 2003 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6