Hi all, I have a question about sample grouping when calling peaks using MACS2:
I have data from two studies containing Input and H3K27ac samples from multiple tissues (3 Input-H3K27ac pairs for each tissue type in one study, and some samples from the other study).
What is the best way to group samples for peak calling: combine all samples from one study for peak calling, or sort them by tissue and do peak calling for each tissue within each study? Or do peak calling for each pair of samples separately?
Definitely do not mix studies. Batch effects will introduce spurious calls. You could indeed group by study and tissue so you get one set of peaks, for example study1-organ1, study2-organ1, study1-organ2, study2-organ2 etc...and then assess reproducible peaks with something like the Irreproducible Discovery Rate framework. That would give you a list of peaks per organ that is consistent between studies.
Thank you for your detailed answer!
I used this as guideline in the past: https://hbctraining.github.io/Intro-to-ChIPseq/lessons/07_handling-replicates-idr.html
HBC has a lot of great resources for training.