I have to analyze 3 chip-seq datasets. I'm fine with the analysis procedure itself but have a question about the normalization. At which step(s) should i normalize my datas?
MACS includes some basic normalization in case you provide a control file. In that case, the larger file is proportionally scaled towards the smaller one. If your goal is [and this is what you should state right away when posting a question like this] to simply call peaks, this is typically sufficient. If you aim to perform differential analysis, have a look at the established tools, like MAnorm, csaw, DiffBind and many more.
MACS normalizes when calling peaks fairly well, though their normalized bedgraphs frankly look terrible on a browser. I use deepTools to create read-depth normalized bigWig files that look much more appropriate in UCSC. deepTools has a few different ways it can normalize, including subtracting input reads from samples, though I typically just use the rpkm option.
If you want to quantitatively compare signal at ChIP-seq peaks, my two favorite tools are DiffBind (R package) if you have biological replicates or MAnorm (Bash/R scripts) if you're trying to compare a single sample to another. They both take care of normalization and do a pretty good job of identifying unique peaks for a given condition/sample.
I typically treat ATAC-seq much the same as ChIP-seq, but use a smaller extension size during peak calling for ATAC-seq, as our fragments are usually smaller. HOMER is also a perfectly good tool (with great documentation), though it can't quantitatively compare signal at peaks last I checked. I found this paper very helpful when trying to identify which tool is best for the job depending on your data type (sharp vs broad signal), if you have replicates, etc.
There are tons of other blogs/githubs/websites that go more deeply into analysis, including the BioStars handbook. This github also has links and some comments about pretty much every tool ever developed for ChIP-seq analysis along with tons of links to other resources, key papers, etc. It's a great resource.
As suggested, MACS/MACS2 will normalize according to the total number of reads. Some of the bigWig creation packages also have the ability to scale by a specified normalization factor, which you will have to do to get a "normalized" bigWig file.
One last thing: if you are looking at a global increase or reduction of whatever you are ChIPping, total read normalization will not work. Something to keep in mind...
if you are looking at a global increase or reduction of whatever you are ChIPping, total read normalization will not work.
This depends on the nature of the ChIP. Transcription factor ChIP-seq often have relatively few (<20K) enriched regions, which should not influence the global scaling approaches too much. Broad histone marks covering large swathes of the genome (e.g. K27me3) can be a different story, though.
Is your 3 ChIPseq dataset for different factor/histone or same factor/histone in different condition ?
The three ChIPseq data are the same factor : one WT and two with mutations
MACS includes some basic normalization in case you provide a control file. In that case, the larger file is proportionally scaled towards the smaller one. If your goal is [and this is what you should state right away when posting a question like this] to simply call peaks, this is typically sufficient. If you aim to perform differential analysis, have a look at the established tools, like MAnorm, csaw, DiffBind and many more.