Hello all!
I have conducted some ChIP-seq analysis using diffbind to compare 2 different conditions, and the number of peaks obtained are very large (e.g. ~49,000 peaks). There are some repeats of genes in the list of peaks, but they are from different regions of the gene. I would like to do some downstream analysis (e.g. gene ontology) , but the number of peaks are way too large.
I have the following questions:
- Should I conduct this kind of cutoff? The diffbind scores range from 1.5 to 6.
- Is there a way to set a cutoff for the number of peaks for downstream analysis?
Something I can think of: set an arbitrary cutoff for diffbind score. Scores > 3.5 are selected for the analysis.
Another thing i can think of: ratio of peak height for condition 1 vs condition 2. This way, I can then select genes with height >1.5 fold in condition 1.
If peak height is a good way to obtain more significant genes, what tools do you recommend?
Thanks!
If number of significant regions is unexpectedly high be sure to use MA-plots in order to check if normalization is off-scale and many false-positives were produced. Actually one should always do that. Proper normalization should center the majority of regions somewhat at y ~ 0.