What Could Be The Reason For Spliced Alignments In Chip-Seq Data?
1
0
Entering edit mode
11.5 years ago

I am looking at a ChIP-seq data set where, for one of the suspected target genes, we see a coverage profile that looks suspiciously like RNA-seq data, i.e. the reads are lining up very regularly along the exons as opposed to the usual peaky profile that one would expect in ChIP-seq. On further inspection, we also find that using TopHat, we find a handful of spliced alignments joining the same two exons in the gene. (Initially we had used a different aligner; this was just for checking the potential artifact I am describing.)

Now, I have heard of genomic DNA contamination in RNA-seq libraries, but I have a harder time figuring out how one can get RNA (or rather cDNA, I suppose) contamination in a ChIP-seq library. Any ideas where this might come from?

chip-seq splicing • 3.5k views
ADD COMMENT
1
Entering edit mode

I have had the same problem, but it is predominantly in the input and not the ChIP-seq data. I have been told that the Taq polymerase used for deep seq library preparation may be able to synthesize a small amount of DNA from an RNA template, and that RNase treatment of the ChIP input DNA is needed. We haven't tested whether this is the case yet.

ADD REPLY
0
Entering edit mode

Interesting, thanks for the comment!

ADD REPLY
0
Entering edit mode

Do you have control channel data? What do these regions look like in those experiments? There are a fair number of edge cases where repetitive sequences might generate such patterns, or nonspecific binding over an interval could occur.

The splice junctions are more interesting / worrying, but maybe you'd start thinking about viral integration events or other transposon-like events. It's not clear what would cause the ChIP enrichment though, at least to me.

ADD REPLY
0
Entering edit mode

There are IgG controls where I haven't looked at these regions yet. Thanks for the suggestion. Yes, I was considering viral integration events, but I am not sure what conclusions to draw from that.

ADD REPLY
0
Entering edit mode

Did you ever manage to figure out a solution to this? I have a very similar behaviour in the Arabidopsis ChIP-Seq data that I am currently looking at, the genes that show this are ones that are transcription factors that have known important functions in the tissue we are looking at.

I see this in the sample and the anti-HA control, but not the Input, rows in the image are sample, Input, anti-HA.

I'm also noticing that they don't seem to have the SNPs that are present in the Input.

ADD REPLY
0
Entering edit mode

Not really - we have just assumed that we are dealing with some sort of artifact and disregarded this particular locus. Meanwhile, I have seen and read this paper which might be relevant: Highly expressed loci are vulnerable to misleading ChIP localization of multiple unrelated proteins. I don't think that would explain your "missing SNPs" though. That is an interesting observation which I didn't see in my data (whether it's there or not).

ADD REPLY
0
Entering edit mode
11.5 years ago

I don't know what you mean regarding the spliced alignments joining the same two exons in the gene, however have you perhaps considered that the "regular" alignments are in fact PCR-duplicates?

ADD COMMENT
0
Entering edit mode

I don't think PCR duplication is the problem, as the picture is close to identical after deduplication. That also wouldn't explain the split-read alignments (which are by the way also not PCR duplicates as they have distinct starting positions although the spliced-out part [i.e. intron] is the same in each case.)

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6