Question

chip-seq low signal vs background

2

Entering edit mode

10.4 years ago

mdp07vm ▴ 30

Hi there,

I'm not a bionformatician but I follow Biostar closely in my quest to learn some meaningful basics of NGS analyses. I performed a chip-seq experiment lately looking at histone markers H3K27ac and H3K4me1 on very small amount of chromatin originating from minute tissue samples. I used HOMER for the analyses and it reported 'low enrichment with high background' for both samples. This is reflected on the profiles when viewed on the UCSC browser. My colleague who is a trained bioinformatician used MACS and reported the same. We did however obtained number of called peaks (or significant peaks vs. input) similar to those reported in the literature and I'm currently investigating these.

Does anyone know of any methods that can be used or applied to my case to enrich for 'true' peaks vs background? And, to be able to view these peaks clearly and distinctively from 'background' on the UCSC browser?

Appreciate any help or advice given.

Thanks,
Victor

ChIP-Seq sequencing next-gen • 6.8k views

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by mdp07vm ▴ 30

1

Entering edit mode

I bet the ChIP did not work well. you may try to use IDR https://sites.google.com/site/anshulkundaje/projects/idr

ADD REPLY • link 10.4 years ago by Ming Tommy Tang ★ 4.5k

0

Entering edit mode

Thanks for the IDR. I will check it out to see if it's helpful for my case.

ADD REPLY • link 10.4 years ago by mdp07vm ▴ 30

0

Entering edit mode

You're not going to have much luck finding programs able to work around low enrichment, at least when it comes to visualization.

ADD REPLY • link 10.4 years ago by Devon Ryan 105k

Ram · Answer 1 · 2014-08-04

3

Entering edit mode

10.4 years ago

Sean Davis 27k

While it is likely the case that repeating the experiment could improve things, that may not be possible for your experiment. If you cannot repeat the experiment, what you CAN do is to do more sequencing of the same libraries to improve your ability to call peaks. Since you seem convinced that the experiment worked, however poorly, doing more sequencing may help.

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by Sean Davis 27k

1

Entering edit mode

Thanks for the suggestion. In fact, the same advice was given to me by our core director here. While I understand that sequencing deeper might help improve on the signal, wouldn't the background be sequenced deeper as well? Appreciate your expert advice on this.

ADD REPLY • link 10.4 years ago by mdp07vm ▴ 30

2

Entering edit mode

Yes, it would. However, we are talking about count data, so as you get more counts, you are able to resolve smaller-and-smaller biological differences even though the relative enrichment remains the same. Concretely, let's say that we have a coin that is slightly biased (heads 60% of the time). With only 10 trials, it will be impossible to perceive the bias statistically. However, without changing the "signal" (use the same coin), flipping the coin 1000 times will easily show that the coin is biased. The same kind of thinking applies to count data and your experiment. The signal remains low, but as the number of reads increases, the power to see the signal increases.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by Sean Davis 27k

0

Entering edit mode

Very well explained. I will make arrangements to have my libraries re-sequenced. Appreciate your help!

ADD REPLY • link 10.4 years ago by mdp07vm ▴ 30

Ram · Answer 2 · 2014-09-04

1

Entering edit mode

10.3 years ago

bede.portz ▴ 540

Victor, here are some thoughts you may have already considered, if so, they may still be of use to other readers.

One question, how do the datasets look relative to one another, not relative to background?

H3K4me and H3K27ac are activating and repressive marks, respectively. If, in your datasets, ignoring background for a moment, those marks are appearing in the same regions of the genome, this may suggest your experiments, either one of both of them, didn't actually work. If they are occurring in a more mutually exclusive distribution relative to one another, even if the background is high in each dataset, than your experiment may have worked and you may have enriched for genomic DNA associated with each mark. In this case, some of the sequencing and bioinformatic approaches may help you.

One caveat, of course, is that your tissues are not homogenous in terms of cell type, and different cells executing different transcriptional programs are giving you a noisy ChIP-seq readout as their chromatin environments aren't homogenous. This is assumed not to be the case when working with a single strain of yeast, a tissue culture cell line, etc, but may be the cause of your issues if you are using a dissected tissue.

Perhaps another way to interrogate this is to look at regions assumed to be active in all differentiated cells and those that are assumed to be transcriptionally inactive in differentiated cells and compare the ChIP-seq enrichment for your two histone marks in these regions to other regions that can be expected to be more variable across a mixed population of cells. Essentially what you would be doing is looking at small numbers of genes (analagous to doing ChIP on a few genes!) within your larger dataset about which reasonable assumptions can be made, and using these genes to determine whether or not the rest of the data can be trusted.

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by bede.portz ▴ 540

0

Entering edit mode

So I may have been off the mark (pun intended) with my comment about H3K27ac being a mark associated with transcriptionaly repressed chromatin, as it is associate with enhancers. However, my point stands that you can juxtapose the position of the two marks on known gene and enhancer regions to see if the data "makes sense," and fits with a priori knowledge as a means to aid in validating the data.

Also, I think my second point is valid, that heterogeneity in the tissue could give rise to heterogeneity in the data, which could manifest itself as noise.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by bede.portz ▴ 540

0

Entering edit mode

Hi there, first of all, thanks for your comments and suggestions. As you pointed out, both the marks do label for enhancer regions. Based on the literatures, the common believe right now which is open to debate, is that H3K27ac marks for 'active' enhancers and H3K4me1 marks for ' poised or intermediate active state' of enhancers. You do see overlap of these markers in some enhancers which are classified 'active' in some literatures. Getting back to my data, I do see regions that overlap both markers and those that don't. However, discriminating the true signals from the background has been the key issue. I have taken steps both computationally and experimentally to address this issue by incorporating the many useful feedbacks I've gained through this post from people like you.

Your other suggestion that inherent heterogeneity in tissues as a 'noise' contributing factor is a really good and valid point, and one which I've gone through with my bionformatics collaborator. However, it's really difficult for me to appreciate that as a 'major' contributing factor based on the fact that peaks are called based on input control which comprise the heterogenous chromatin content of the tissue. Also, there are a handful of literatures out that describe successful chip-seq experiments using the same markers in a variety of tissues. The key difference between those tissues and mine is the size. It's almost like comparing a watermelon to a grape with my tissue being the latter. It became more evidential to me over time that the amount of starting chromatin (and the quality of it) is crucial for a clean and working chip-seq data.

Do feel free to provide any other ideas or thoughts you may have on this as I would really like to hear them. Thanks again.

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.3 years ago by mdp07vm ▴ 30