Hello,
I am working with some chip-seq data for a broad epigenetic mark and want to perform some peak calling analysis. From some initial qc analysis based on deeptools fingerprint plots, it seems my chip efficiency was not very good and there is a lot of background (but I have also read that this may be normal for broad marks?).
My initial peak calling with macs2 and homer using pretty liberal parameters yielded much fewer peaks than expected (~5000 compared to ~20000 in literature). My first question is if it would be ok for me to proceed with the analysis of these peaks? In other words, is it normal to see such big differences between my data and published results for the same mark? Are there any other things I can try to get more peaks or reduce background?
Alternatively, if this is because of low chip efficiency, would it make sense if I don’t do a peak calling analysis but still use the data by quantifying the signal at specific regions such as promoters and maybe do some clustering (both deeptools and homer have some good tools for this)? I am comparing the occupancy of this mark between two different conditions so I am assuming both conditions would be equally affected by the low efficiency.
Any tips would be appreciated.
Thanks
Hi, can you post the commands you used for calling the peaks?
Is this a downloaded dataset and with "literature" you mean the paper that published it had 20k peaks versus 5k that you get or did you generate the data? In general (in my experience) without code any number that a paper reports is not much worth, meaning that by changing lowlevel processing and thresholding you can get from a given dataset basically any number of peaks and it even gets more variable depending on how to assess "reliable peaks" between replicates with methods such as IDR which again have thresholds etc one can change. Without code barely reproducible. If you generated the data yourself then you can also have tremendously different results depending on antibody, protocol and/or processing. ChIP-seq is a pain and heavily depends on the circumstances. Comparing with published datasets is very cumbersome.