I am looking at a ChIP-seq data set where, for one of the suspected target genes, we see a coverage profile that looks suspiciously like RNA-seq data, i.e. the reads are lining up very regularly along the exons as opposed to the usual peaky profile that one would expect in ChIP-seq. On further inspection, we also find that using TopHat, we find a handful of spliced alignments joining the same two exons in the gene. (Initially we had used a different aligner; this was just for checking the potential artifact I am describing.)
Now, I have heard of genomic DNA contamination in RNA-seq libraries, but I have a harder time figuring out how one can get RNA (or rather cDNA, I suppose) contamination in a ChIP-seq library. Any ideas where this might come from?
I have had the same problem, but it is predominantly in the input and not the ChIP-seq data. I have been told that the Taq polymerase used for deep seq library preparation may be able to synthesize a small amount of DNA from an RNA template, and that RNase treatment of the ChIP input DNA is needed. We haven't tested whether this is the case yet.
Interesting, thanks for the comment!
Do you have control channel data? What do these regions look like in those experiments? There are a fair number of edge cases where repetitive sequences might generate such patterns, or nonspecific binding over an interval could occur.
The splice junctions are more interesting / worrying, but maybe you'd start thinking about viral integration events or other transposon-like events. It's not clear what would cause the ChIP enrichment though, at least to me.
There are IgG controls where I haven't looked at these regions yet. Thanks for the suggestion. Yes, I was considering viral integration events, but I am not sure what conclusions to draw from that.
Did you ever manage to figure out a solution to this? I have a very similar behaviour in the Arabidopsis ChIP-Seq data that I am currently looking at, the genes that show this are ones that are transcription factors that have known important functions in the tissue we are looking at.
I see this in the sample and the anti-HA control, but not the Input, rows in the image are sample, Input, anti-HA.
I'm also noticing that they don't seem to have the SNPs that are present in the Input.
Not really - we have just assumed that we are dealing with some sort of artifact and disregarded this particular locus. Meanwhile, I have seen and read this paper which might be relevant: Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins. I don't think that would explain your "missing SNPs" though. That is an interesting observation which I didn't see in my data (whether it's there or not).