• Data: paired-end (using only R1, trimmed to 50 bp).
• Aligned with Bowtie2, filtered the BAM (unmapped, low MAPQ), but did not deduplicate (no bottleneck issues).
• Created tagAlign files, subsampled, and ran cross-correlation analysis with phantompeakqualtools.
Results:
Most cross-correlation plots look like this:
Even in controls, the phantom and ChIP peaks are similar:
Most samples have NSC < 1.02 and RSC between 0.9-1.4, suggesting weak enrichment.
Questions
1. Is my workflow correct?
2. What could cause multiple peaks, especially the large one near zero?
3. If this is a wet-lab issue, which steps should we revisit to improve enrichment?
There is no reason to throw out data. Use both reads, and don't trim. These cross-correlations are based on coverage and shouldn't be impacted by having both reads (indeed, having both reads will give you direct inference for the fragment size).
The peaks are at suspicious offsets (150, 300, etc) which comport with nucleosome binding patterns. These should show up in the fragment length distribution if you've under-digested or under-sonicated (or if this isn't ChIP but instead Cut&Tag/Cut&Run); but nucleosomes should not be identically positioned across different cells, so this shouldn't show up in cross-correlation plots - unless you're profiling a TF that induces nucleosome positioning (such as REST).
What do the tracks look like in IGV? You can pretty much tell the quality by eye.
There's not enough context in that screenshot - I can't tell what I'm looking at. Is this genome-wide so the breaks are between contigs, or is that all along one contig and the breaks are at unalignable regions, is it a 10mb window...how was it normalized, etc.
If you could do the following:
(1) Set both tracks to summarize by maximum
(2) Set both tracks to autoscale
(3) Go to an arbitrary 15mb window that does NOT include a centromere or telomere
(4) De-select autoscale
(5) Select a 2.5mb region with a good amount of protein-coding genes
This will (approximately) scale the tracks to something like 99% of the max peak height.
That said, my gut instinct here is that unless you're profiling something crazy dense like H2AK119ub or H3K36me2, this experiment probably failed.
Thank you so much for your answer! Here are my tracks:
The blue tracks are my control and red tracks are my ChIP sample.
There's not enough context in that screenshot - I can't tell what I'm looking at. Is this genome-wide so the breaks are between contigs, or is that all along one contig and the breaks are at unalignable regions, is it a 10mb window...how was it normalized, etc.
If you could do the following:
(1) Set both tracks to summarize by maximum
(2) Set both tracks to autoscale
(3) Go to an arbitrary 15mb window that does NOT include a centromere or telomere
(4) De-select autoscale
(5) Select a 2.5mb region with a good amount of protein-coding genes
This will (approximately) scale the tracks to something like 99% of the max peak height.
That said, my gut instinct here is that unless you're profiling something crazy dense like H2AK119ub or H3K36me2, this experiment probably failed.