Hi All,
I am a beginner in chip-seq analysis.
I am trying to find a set of peaks for cancer data. The data that I have include 2 samples (health and cancerous sample). I would like to use MACS for peak calling.
My question is: For peak calling if there is a healthy and cancerous chip seq data, can we use healthy sample as CONTROL?
Or is it wrong? Because the healthy sample is not the INPUT data I am confused.
Any tip is highly appreciated.
Regards
Thank a lot for your prompt response. So based on your knowledge, I can't just simply use the healthy sample as the control sample for peak calling the cancer data but rather I should do two separate peak calling for each sample. and then do a comparison! Please correct me if I'm wrong. We can do peak calling with MACS when so control sample is available though, so I have to use this technique. I have checked a paper it seems in their research they are using healthy sample as control, and do the peak calling!
I think there is ambiguity in the world "control" as it has two different meanings in this context.
I think they did the peak calling on both the healthy and cancerous samples (i.e. see the excel files -- supplementary tables 1 and 2). And then did some analysis on the differential peaks.
I do not think using healthy sample as the control (
-c
) parameter in MACS is a good idea. It is still a "control", of course, as you are comparing the peaks from cancerous sample to the peaks from normal sample and looking for significant differences.-c
parameter in MACS is often used for ChIP samples called "Input" which is a different form of a "control".You might want to look at bdgcmp and diffpeaks options of MACS2, as it seems to be designed to solve what you want to do. I am not entirely sure how one would run these, but if I had to guess you need to run macs callpeaks twice. Once with
-c input -t healthy
, once with-c input -t cancerous
, with the-B
parameter explicitly set to generate bedgraphs. And then run these two commands with the appropriate bedgraph files generated in previous steps. Let me know how that goes.Thanks for your detailed answer. I have checked the excel files, S1 and S2 are the output of MACS, S1 being the peaks and S2 being the negative peaks. I ran MACS, while having cancer data as treatment and healthy sample as control and I got almost the same number of peaks (S1 and S2). the author seems to be combining these two peak lists in excel and then has done the annotation in S3, which also refers to differential peaks (I think).
I have checked the document for MACS2, and it seems to be very interesting indeed. gonna download and implement the peak calling with it to see the differences. Thanks a lot once again for your suggestions.