Question

Is ChIP-seq control file in peak calling must whole genome size??

0

Entering edit mode

7.3 years ago

dhehdqls • 0

I try to peak calling for benchmark data by MACS2.

suppose I want to use subsection of genome chr1:3,000,000-3,100,000.

so this is my input : < JUN_K562.bam , JUN_control.bam >.

and I want to check result by to modify threshold value. like a Q value , broad-cutoff in broad calling.

however, if I slice JUN_control.bam by that region ( 3M to 3M + 100K ) same as JUN_K562.bam,

there is no change on result file even I keep modify Q-value. I don`t know why.

but when I try this with whole genome size control file, it was worked.

so, this is my question. when I want to peak calling with small subsection of genome,

control file must be whole genome size?

sequence MACS ChIP-Seq • 1.8k views

ADD COMMENT • link updated 17 months ago by Ram 44k • written 7.3 years ago by dhehdqls • 0

score 3 · Accepted Answer · 2017-08-05

MACS2 calculates the genome wide background using the formula (the_number_of_control_reads*fragment_length)/genome_size and uses this information to estimate the raw local bias ( small local background + large local background + genome wide background ).

So definitely the number of reads in control file affects the background noise calculation.

MACS2 also scales down the ChIP and control to same sequencing depth ( after calculating raw local bias ) to estimate the local lambda. If you have too few reads in your treatment or control, they will be scaled down. This might also effect your results.

At the end, its NOT A GOOD IDEA to call peaks on subset of data. Call peaks on all the data and use the peaks falling in regions of interest.