Question

peak calling for cancer and healthy data?

0

Entering edit mode

10.4 years ago

ukpersia1 ▴ 20

Hi All,

I am a beginner in chip-seq analysis.

I am trying to find a set of peaks for cancer data. The data that I have include 2 samples (health and cancerous sample). I would like to use MACS for peak calling.

My question is: For peak calling if there is a healthy and cancerous chip seq data, can we use healthy sample as CONTROL?

Or is it wrong? Because the healthy sample is not the INPUT data I am confused.

Any tip is highly appreciated.

Regards

MACS ChIP-Seq peak-calling • 3.3k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by ukpersia1 ▴ 20

Ram · Answer 1 · 2015-03-29

1

Entering edit mode

10.4 years ago

Saulius Lukauskas ▴ 540

Hi,

In a nutshell, ChIP-seq analysis is very prone to various biases due to the experimental protocol. These confounders might artificially inflate the signal at certain loci and need to be accounted for. These biases vary between instruments, labs etc, so, when doing ChIP experiments people first do an input (control) run where no targeted antibody pull-down is done. Then, after that a targeted pull-down is done. Theoretically the bias in the input sample is the same as the bias in the targeted (treatment) sample and thus we can separate the signal from the noise (bias). I think this is the input sample you were confused about.

Now different cell types will have different ChIP-patterns, and, of course, cancerous cells will differ from the healthy ones even more. What you need to do is to call peaks on both samples using appropriate input samples and then use some sort of analysis to compare what is happening.

I would suggest reading this review thoroughly before you start looking into the data. It covers most of the things you need to know.

You can look at Roadmap Epigenome project for a large set of high-quality ChIP-seq data as well.

Hope this helps!

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 10.4 years ago by Saulius Lukauskas ▴ 540

0

Entering edit mode

Thank a lot for your prompt response. So based on your knowledge, I can't just simply use the healthy sample as the control sample for peak calling the cancer data but rather I should do two separate peak calling for each sample. and then do a comparison! Please correct me if I'm wrong. We can do peak calling with MACS when so control sample is available though, so I have to use this technique. I have checked a paper it seems in their research they are using healthy sample as control, and do the peak calling!

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by ukpersia1 ▴ 20

1

Entering edit mode

I think there is ambiguity in the world "control" as it has two different meanings in this context.

I think they did the peak calling on both the healthy and cancerous samples (i.e. see the excel files -- supplementary tables 1 and 2). And then did some analysis on the differential peaks.

I do not think using healthy sample as the control (-c) parameter in MACS is a good idea. It is still a "control", of course, as you are comparing the peaks from cancerous sample to the peaks from normal sample and looking for significant differences. -c parameter in MACS is often used for ChIP samples called "Input" which is a different form of a "control".

You might want to look at bdgcmp and diffpeaks options of MACS2, as it seems to be designed to solve what you want to do. I am not entirely sure how one would run these, but if I had to guess you need to run macs callpeaks twice. Once with -c input -t healthy, once with -c input -t cancerous, with the -B parameter explicitly set to generate bedgraphs. And then run these two commands with the appropriate bedgraph files generated in previous steps. Let me know how that goes.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by Saulius Lukauskas ▴ 540

1

Entering edit mode

Thanks for your detailed answer. I have checked the excel files, S1 and S2 are the output of MACS, S1 being the peaks and S2 being the negative peaks. I ran MACS, while having cancer data as treatment and healthy sample as control and I got almost the same number of peaks (S1 and S2). the author seems to be combining these two peak lists in excel and then has done the annotation in S3, which also refers to differential peaks (I think).

I have checked the document for MACS2, and it seems to be very interesting indeed. gonna download and implement the peak calling with it to see the differences. Thanks a lot once again for your suggestions.

ADD REPLY • link updated 3.3 years ago by Ram 45k • written 10.4 years ago by ukpersia1 ▴ 20

score 0 · Answer 2 · 2017-01-29

0

Entering edit mode

8.6 years ago

EagleEye 7.6k

Input sample is needed to adjust (clean) for background. Check out this article, Chip-Seq: technical considerations for obtaining high-quality data

Always follow the quality article especially for the protocols (experimental or bioinformatics).

ADD COMMENT • link 8.6 years ago by EagleEye 7.6k