duplicated read in ChIP seq
2
2
Entering edit mode
8.1 years ago
op263 ▴ 50

Hello,

I just received sequences from the first ChIP seq experiment done in my lab. I run triplicates for the samples and used input as a control. I started the analysis with UseGalaxy and already have some problems after FastQC step!! I found high level of duplicated read in my samples (input are fines) with only 4% of seqs remaining if deduplicated.

Is it worth making the analysis after removing the duplicates? I was considering removing the duplicated read and combining single reads from the triplicates.

many thanks for any help!

Olivier

ChIP-Seq • 5.2k views
ADD COMMENT
5
Entering edit mode
8.1 years ago

Duplication is expected in ChIP-Seq, but 96% duplication is not unless 1) you depth of coverage is massively excessive, or 2) your binding factor interacts with very few sites. A much more common explanation is that your IP failed and/or the amount of IPed chromatin was too low for efficient library construction, which results in a huge amount of PCR duplication. You can discriminate via genome browser of your non-deduplicated data. Bona fide peaks will have multiple overlapping reads with offsets, while samples with only PCR duplicates will stack up perfectly without offsets.

ADD COMMENT
2
Entering edit mode
8.1 years ago
mastal511 ★ 2.1k

You would expect to find duplicated sequences in Chip-Seq data, because you are only sequencing the parts of the genome pulled down by the IP procedure. Your data is probably fine, so don't remove the duplicates.

ADD COMMENT

Login before adding your answer.

Traffic: 1505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6