Hi, everyone
I am always confused whether we should resize peak before ATAC/ChIP-Seq differential peak analysis (DA). I have see some DA tools or DA review or paper. Some use the resize while some do not. But it seems that it do not say the advantages and disadvantages. Now I am using DiffBind to DA. It recommend resize peak when dealing with broad peak, like H3K27me3.
For me, I think we should not resize the peak, after all, we do not resize gene when do RNA DA. But if we do not resize, it will introduce a lot of noise when dealing with motif analysis.
So I am wondering whether some one can give me addvices.
Best wishes
Guandong Shang
Just like with RNA Seq analysis, you should keep in mind that there may be a size bias. All other things being equal longer features may have an easier time collecting reads than shorter features, but this doesn't prevent people from doing RNA Seq, and if you examine your results with this awareness, I don't see how it would matter much.
You can always resize when you do your motif analysis. Native peaks for quantitative analysis, resized features (i.e. peak centers +/- some flank) for motif analysis.
I think there is an important difference between RNAseq and ATAC/ChipSeq. In RNAseq genes are given in advance and the experimental data is used only for quantification and differential expression. In ATAC/ChIPseq the experimental data is used both for defining the features of interest and for quantifying them. I think this causes distortions especially when one experimental group has more peaks than another or some replicates worked better or have larger library sizes. I cannot quite formalise the argument but I think in ATAC/ChIP you use the data twice, once for discovery, once for quantification and this is not ideal from a statistical point of view. (Sorry if this was a bit off-topic...)
That's OK. I am very greatful that some one can talk about this :).
Thanks for your reply:)
But If I want to do motif enrichment analysis, how can I deal with native peak and resize? The detailed analysis can be like:
And the point is should I use the resize or naive peak count for kmeans ? In order to avoid including noise when Kmeans,I always use DA result to pick peak I use for k-means. If I use the naive peak to do DA, and pick peak according to lfc and p-value then do k-means. how I can do scan motif for these navie peak ? should I resize these naive peak, then scan motif ?
Best wishes
Guandong Shang
If you are following a pipeline or package that recommends that, I will do it, just because the the statistic behind the program may needs it for the best. RNAseq and ATAC/chipSeq are different approach so do not extract conclusions from one to other in therms of analysis.. However, if finally you will decide not to follow developer's recommendation you can carry on with it and if the downstream analysis demonstrates a problem then you can backtrack and consider it again.