Hi all,
The question is related to ChIP-seq, whether we should (at least in my case), consider a genomic loci with high IP signal over background even though there is no peak identified?
Following is the situation in one of my ChIP-seq analyses, in following figure, panel A, I found peaks (co-bound) for proteins (P1, P2, P3
) before and after treatment. There is another class of peak set, panel B, which were identified to be only bound by P3
. But as you can see, for P2
, we still have good amount of signal at these regions. Y-axis is log2 ratio of IP over background.
It is difficult to explain, example for P2, at a log2 ratio value of 1 or 1.5
there is a peak in panel A, but not in panel B. But the biological evidence is almost always P3
is co-bound with P2
. I was just wondering if I just proceed with the analysis with the current peak set, I would miss a lot of important genes. On the other hand, there is no (to my knowledge) best way to address this issue.
At this point, I am considering to
- Calculate the average fold change (or log2 ratio compared to background) at which a real peak is identified for proteins
- If peak sets from panel B show fold change/log2 ratio for protein
P2
above the fold change identified from step 1, I will merge them to panel A.
I used MACS2
to identify peaks with genomic background and bamCompare
for calculating log2 ratios over background
Any suggestions or ideas would be greatly appreciated.
Thanks!
Hi Jared, thanks for the answer.
One more thing I didn't mention is
MAnorm
MAnorm produced these regions as not significantly different before and after treatment (which we are interested in). However, within these peaks I observed these strange behaviour for some regions with high signal but not called as peaks. I will check out csaw and SPAN.
Ah, MAnorm isn't bad either. For single replicate comparisons, it's as good a tool as I've seen. Sounds like peak calling may be your issue. What parameters are you using with MACS? You can try lowering the q-value threshold and decreasing the lower bound on the mfold setting, which should increase sensitivity but also increase false positives. Since you're running MAnorm afterwards, I wouldn't be quite as worried about the false positives given that you're really only interested in those that differ between your conditions.
I was also thinking of lowering qvalue cut off (I've used 0.01). Co-occupancy is one of the strongest point for whole work, that's why I am trying not to loose any of the sites/genes which show actual signal. Do you have any comments on the points I mentioned in the main question about merging based on log2 ratio over backgrounds?
And by lowering, I guess I really meant raising the q-value threshold (maybe try 0.05?). I'm not sure what you mean by "merge them to panel A" in your main question. You mean remove them from the P3 specific list? There could be a lot of reasons you're seeing what you're seeing.
A few things to consider:
Yes, by merging, what I meant is basically adding some peaks from panel B to panel A.
We have no doubt there is significant co-occupancy as expected. But the difficulty is in when looking at P3 specific peaks and saying only P3 is bound here but when we compare both A and B figures, a region for P2 is called as peak in panel A at a value of 1.5 but not called as a peak in panel B even at 2.5. Of course, it is not as simple as it looks but to a biologist it would make much more sense to merge some of sites from B into A. And I'm really not able to convince him that these are not confident/due to noise/some other reason :)
Right, the ratios might be based on those counts, but ratios are often misleading. I'd try making the same panels with the log2 normalized counts - I think this will be more convincing than the ratios. Merging sites from B into A selectively could be done, but you need to be careful to set strict rules you stick to.
Thanks. I will definitely check the log2 normalized counts.