Entering edit mode
3.9 years ago
Jimpix
▴
10
Hey! I am new in CHIP-seq analysis so please be understanding. I have got CHIP-seq data and I want to do simply regular peak calling. I am typing:
macs3 callpeak -t A.bam -c input.bam -f BAM -g 483151142 -n A
and I have got:
#2 Total number of paired peaks: 0
Anyone know why? And this advise:
we suggest to use --nomodel and --extsize 147 or other fixed number instead.
In the simplest case you do not have peaks because the ChIP did not properly work. Did you check data on a genome browser such as the IGV to see how it looks by eye?
I have checked in IGV, it has peaks
I noticed that it is depend on -g parameter. How to determine the correct one?
It is the effective (=mappable) genome size of your organism. What is your organism?
Schizosaccharomyces pombe
It is quite disrespectful to delete threads that received help and then open another identical one. Biostars is a community driven by volunteers, this is no ad-hoc help service. This has also been asked before and the answers there tell you how to determine g.
MACS, effective genome size
Effective genome size of UCSC hg38
As I said already in a comment above, -g is the genome size of your organism, so count the number of nucleotides in the reference file and put that as -g. You used 483151142 above, that yeast has about 14Mb so 14000000, try that. It actually is the part of the genome that is mappable, so not repetitive but unique at the read length you have, but lets not make it too complicated and simply use the length of the genome as in the reference you mapped against.
I've deleted because I didn't want two with the same question but I will not do that again.
According to your advise, when I type: 1. 14000000 - 14 peaks 2. 12631379 (number of nucleotides in the reference file) - 14 peaks. 3. 140000000 - 671 peaks
MACS3 needs at least 100 paired peaks at + and - strand to build the model. Why there are more peaks if the g parameter is much bigger (wrong)? Besides, in IGV I can see that there are many more peaks than 14.
The larger the genome the smaller the change that reads accumulate randomly at a given location, hence p-values for a true peak would be smaller the larger the -g parameter is, based on my understanding of the macs algorithm.
You can calculate mappability (effective genome size) by using these directions. You will need to relate it to the length of reads you have.
I deleted the comments by mistake and I don't know how to restore them
Should be restored now. Once a parent comment is deleted all child comments become invisible. So be careful with that.