I am analyzing chipseq data and I am trying to figure out the handling of biological replicates and the best tools to use. I tried to dig into the former questions, but it is hard to navigate thorough so many information.
I have 2 different genotypes for my ChIP-seq data, each with two replicates.
I called peaks with MACS2 and check quality with ChipQC.
I read about IDR, bedtools for overlapping peaks, which one would you recommend?
Shall I then combine the peaks before comparing them with DiffBind?
It would be great to receive some inputs (vignette, pipelines,...) to get a better idea on how to proceed from this point.
A lot depends on what you're trying to do with this data. "IDR, bedtools for overlapping peaks, which one would you recommend?" IDR will tell you which peaks appear to be the most reproducible, and give you a sense of how your data decays with decreasing signal. With bedtools you could create a set of peaks found in both replicates as a simple way of getting a reproducible set. Diffbind can take replicates (thus why combine them?) - have you read the vignette?
I am trying to get specif binding sites for my TF in the two genotypes. and yes, but I read that it was important to combine reproducible peaks of the replicates before proceeding with diffbind, that's why my confusion.
DiffBind has an extensive and (hopefully) clear vignette that walks through an analysis of transcription factor binding and discusses some of the issues that arise -- that is a good place to start.
You should be able to run using only the original reads in BAM files and the peaks output by MACS2. You do not need to combine them as DiffBind will form a consensus peakset automatically. In a differential analysis, it is not crucial to get a perfect map of exactly where the binding sites are so long as you have identified the regions of potential enrichment -- the statistical analysis will filter out regions that do not have consistent enrichment. If you are interested in narrowing down to very specific, high confidence binding sites for another purpose, then using a technique such as IDR to combine information form the samples is useful, but that really is separate from the differential analysis.
The one thing I would point out is that two replicates for each genotype is likely not sufficient to perform a robust differential analysis. You can override DiffBind's default requirement that there be at least three replicates per sample group using the dba.contrast() function (either by setting minMembers=2 or explicitly setting the contrast parameter).
A lot depends on what you're trying to do with this data. "IDR, bedtools for overlapping peaks, which one would you recommend?" IDR will tell you which peaks appear to be the most reproducible, and give you a sense of how your data decays with decreasing signal. With bedtools you could create a set of peaks found in both replicates as a simple way of getting a reproducible set. Diffbind can take replicates (thus why combine them?) - have you read the vignette?
I am trying to get specif binding sites for my TF in the two genotypes. and yes, but I read that it was important to combine reproducible peaks of the replicates before proceeding with diffbind, that's why my confusion.