Question

[ChIPSeq] Multiple Peaks at Cross Correlation Analysis

0

Entering edit mode

6 weeks ago

Diren • 0

Hello, I’m analyzing ChIP-seq data using the ENCODE pipeline (https://github.com/ENCODE-DCC/chip-seq-pipeline2) and need some guidance on interpreting cross-correlation plots. Analysis Steps:

•   Data: paired-end (using only R1, trimmed to 50 bp).
•   Aligned with Bowtie2, filtered the BAM (unmapped, low MAPQ), but did not deduplicate (no bottleneck issues).
•   Created tagAlign files, subsampled, and ran cross-correlation analysis with phantompeakqualtools.

Results:

Most cross-correlation plots look like this: ChIP - cc result 1

Even in controls, the phantom and ChIP peaks are similar: ChIP control - cc result 2

Most samples have NSC < 1.02 and RSC between 0.9-1.4, suggesting weak enrichment.

Questions

1.  Is my workflow correct?
2.  What could cause multiple peaks, especially the large one near zero?
3.  If this is a wet-lab issue, which steps should we revisit to improve enrichment?

Thanks in advance!

phantompeakqualtools chipseq crosscorrelation encode • 627 views

ADD COMMENT • link 6 weeks ago by Diren • 0

score 0 · Answer 1 · 2024-11-12

0

Entering edit mode

6 weeks ago

LChart 4.7k

There is no reason to throw out data. Use both reads, and don't trim. These cross-correlations are based on coverage and shouldn't be impacted by having both reads (indeed, having both reads will give you direct inference for the fragment size).

The peaks are at suspicious offsets (150, 300, etc) which comport with nucleosome binding patterns. These should show up in the fragment length distribution if you've under-digested or under-sonicated (or if this isn't ChIP but instead Cut&Tag/Cut&Run); but nucleosomes should not be identically positioned across different cells, so this shouldn't show up in cross-correlation plots - unless you're profiling a TF that induces nucleosome positioning (such as REST).

What do the tracks look like in IGV? You can pretty much tell the quality by eye.

ADD COMMENT • link 6 weeks ago by LChart 4.7k

0

Entering edit mode

Thank you so much for your answer! Here are my tracks: enter image description here

The blue tracks are my control and red tracks are my ChIP sample.

ADD REPLY • link 6 weeks ago by Diren • 0

0

Entering edit mode

There's not enough context in that screenshot - I can't tell what I'm looking at. Is this genome-wide so the breaks are between contigs, or is that all along one contig and the breaks are at unalignable regions, is it a 10mb window...how was it normalized, etc.

If you could do the following:

(1) Set both tracks to summarize by maximum

(2) Set both tracks to autoscale

(3) Go to an arbitrary 15mb window that does NOT include a centromere or telomere

(4) De-select autoscale

(5) Select a 2.5mb region with a good amount of protein-coding genes

This will (approximately) scale the tracks to something like 99% of the max peak height.

That said, my gut instinct here is that unless you're profiling something crazy dense like H2AK119ub or H3K36me2, this experiment probably failed.

ADD REPLY • link 6 weeks ago by LChart 4.7k

0

Entering edit mode

Thank you so much for the instructions. I followed the steps: enter image description here

And I am profiling for KMT2A. Red ismy ChIP sample and blue is control again.

ADD REPLY • link 6 weeks ago by Diren • 0

0

Entering edit mode

Sorry. Looks like your assay failed. :(

ADD REPLY • link 6 weeks ago by LChart 4.7k

0

Entering edit mode

Well, I really appreciate your help! Thank you!

ADD REPLY • link 6 weeks ago by Diren • 0