Question

Comparing input read subtracted ChIP-read profiles using Deeptools v3.4.3

0

Entering edit mode

4.9 years ago

Ian 6.1k

I have used the Deeptools2 packages to produce a plot comparing profiles of histone binding relative to gene start and ends. However the first sample only reaches about 2 on the Y axis, and the second lays around 0. So I am wondering if I am using it incorrectly? In particular the use of normalisation between the merged ChIP reads that are at least 3x coverage compared to the single input sample. Does readCount normalisation normalise per million reads, for example? What is the scale of the Y-axis?

I have put my steps below in case anyone can spot the flaw. Thank you for any guidance!

1) Combined triplicate mapped reads (sorted BAM) of each sample using samtools merge.
2) Use bamCompare to subtract an input read sample (sorted BAM) from the merged reads in step 1.

bamCompare --bamfile1 merged_reps_sampleA.bam --bamfile2 input.bam --scaleFactorsMethod readCount --operation subtract --binSize 50 --blackListFileName blacklist_file.bed -p 8 --effectiveGenomeSize 2900000000 --ignoreDuplicates -o subtracted_input_sampleA.bw

3) Use computeMatrix to create a matrix of reads relative to canonical gene annotation.

computeMatrix scale-regions -S subtracted_input_sampleA.bw subtracted_input_sampleB.bw -R knownCanonical.bed --samplesLabel SampleA SampleB -p 8 -a 5000 -b 5000 -o SampleA_vs_SampleB_matrix.gz

4) Use plotProfile to produce the plot that compares the profile of both samples.

 plotProfile --matrixFile SampleA_vs_SampleB_matrix.gz --plotType lines --numPlotsPerRow 1 --perGroup --SampleA_vs_SampleB_matrix_geneProfile_TSS_TES_5k_matrix.tab -o SampleA_vs_SampleB_matrix.gz_geneProfile_TSS_TES_5k.png --regionsLabel "HG19 UCSC canonical genes"

Resulting plot:

deeptools • 3.2k views

ADD COMMENT • link 4.9 years ago by Ian 6.1k

score 3 · Accepted Answer · 2020-10-01

3

Entering edit mode

4.9 years ago

Devon Ryan 105k

You seem to be using deepTools 3.x rather than deepTools2.

The guts of read count normalization is the following bit of code:

mapped_reads = [bam1_mapped, bam2_mapped]

# new scale_factors (relative to min of two bams)
scale_factors = float(min(bam1_mapped, bam2_mapped)) / np.array(mapped_reads)

bam1_mapped and bam2_mapped contain an estimate of the number of mapped reads in each file after filtering, so the scale factors are 1.0 for the smaller file and something <1 for the larger file. For a given genomic bin, with the number of reads stored in tileCoverage[0] and tileCoverage[1]:

value1 = scale_factors[0] * tileCoverage[0]
value2 = scale_factors[1] * tileCoverage[1]
bin_value = value1 - value2

What the resulting bigWig contains and what you have plotted is bin_value, which is the "depth normalized difference in coverage" in this case. It would seem that sample B had much lower normalized depth in this region (if this is unexpected, then perhaps there are genomic regions that should be excluded (blacklisted)).

ADD COMMENT • link 4.9 years ago by Devon Ryan 105k

0

Entering edit mode

Thank you Devon and for pointing out that I should be referring to it as Deeptools2, old habits die hard. Sample B was expected to be lower. Just to clarify, the Y-axis is explained as "the depth normalized difference in coverage", which is the relative difference in coverage after normalisation? Is my use of bamCompare correct for what I am trying to achieve, i.e. showing the difference in histone binding profiles? Does "subtract" reduce the number of normalised ChIP reads (per bin) by the number of normalised input reads (per bin)?