Hi guys, I have H3K27ac chip-seq data from two different samples and attempted to call differential peaks in these two dataset. I have finished the analysis by MACS2 but when I looked back to the peaks by IGV, there were a lot of significantly differential peaks that have been missed. I checked all scripts step by step but did not find anything wrong. The only issue that I am concerned is the actual effective depths. The actual effective depth that I got from different samples have three times differences: one is 10 million, the other is 29 million. Do you guys think that is the reason that I have this trouble? And how can I fix it? Thank you.
Unfortunately, diffReps is unavailable in our computer resource center. I have to use MACS2 to call differential peaks. I got the results but there was a problem bothering me much. The final result missed some important peaks. For example, gene A is a marker gene for cell type A. I have seen the peaks (fold enrichment: 3-4; the peak length 200-1200) in the peak file from cell type A after broad peak calling by MACS2, which were absent in cell type B. However, after call differential binding events for cell type A and B, those peaks were not in the final results. I have tried used different parameters to call differential binding events, like -g 60 or 100 or 200. But the problem was still unsolved. Any ideas?
1) What is the exact command that you used?
2) How did you calculate the effective depths (--d1 and --d2 values for macs2 bdgdiff)?
macs2 bdgdiff --t1 A_treat_pileup.bdg --c1 A_control_lambda.bdg --t2 B_treat_pileup.bdg --c2 B_control_lambda.bdg --d1 17119467 --d2 17119467 -g 60 -l 120 --o-prefix diff_A_vs_B
I used the command from the tutorial of MACS2 to get the effective depths (egrep). There was some problems with the input data from one sample so I used the same control for both samples.
You stated in the original message that one sample had a read depth of 10M, so how can the effective depth be ~17M?
I changed the control file (input), that's why the effective depth has changed.
Effective depth ≠ control depth. From the tutorial:
And it's unclear how changing the control data set would change the effective depth of both treatment data sets to 17M (from 10M and 29M in the OP).
$ egrep "tags after filtering in treatment|tags after filtering in control" cond2_peaks.xls
here is the script that I used to evaluate the effective depth. I am sure that the peak calling is right because all important peaks were there. I just got problems with differential peaks calling. I have checked other unregulated genes and they have corresponding peaks. I don't know whether I need to stick on this one gene, which is very important gene though.
You need to determine the effective depth for both conditions, and specify the appropriate depth for each when running bdgdiff.
Before running bdgDiff did you use the --SPMR setting as a basic normalisation method.
From the tutorial:
No. I used MACS2 call peaks (-broad) as my peak calling method. What will that make difference?
Try using the exact commands described in this tutorial.