macs3 peak calling usage
0
0
Entering edit mode
3.9 years ago
Jimpix ▴ 10

Hey! I am new in CHIP-seq analysis so please be understanding. I have got CHIP-seq data and I want to do simply regular peak calling. I am typing:

macs3 callpeak -t A.bam -c input.bam -f BAM -g 483151142 -n A

and I have got:

#2 Total number of paired peaks: 0

Anyone know why? And this advise:

we suggest to use --nomodel and --extsize 147 or other fixed number instead.
macs3 callpeak • 4.3k views
ADD COMMENT
0
Entering edit mode

In the simplest case you do not have peaks because the ChIP did not properly work. Did you check data on a genome browser such as the IGV to see how it looks by eye?

ADD REPLY
0
Entering edit mode

I have checked in IGV, it has peaks

ADD REPLY
0
Entering edit mode

I noticed that it is depend on -g parameter. How to determine the correct one?

ADD REPLY
1
Entering edit mode

It is the effective (=mappable) genome size of your organism. What is your organism?

ADD REPLY
0
Entering edit mode

Schizosaccharomyces pombe

ADD REPLY
0
Entering edit mode

It is quite disrespectful to delete threads that received help and then open another identical one. Biostars is a community driven by volunteers, this is no ad-hoc help service. This has also been asked before and the answers there tell you how to determine g.

MACS, effective genome size

Effective genome size of UCSC hg38

As I said already in a comment above, -g is the genome size of your organism, so count the number of nucleotides in the reference file and put that as -g. You used 483151142 above, that yeast has about 14Mb so 14000000, try that. It actually is the part of the genome that is mappable, so not repetitive but unique at the read length you have, but lets not make it too complicated and simply use the length of the genome as in the reference you mapped against.

ADD REPLY
0
Entering edit mode

I've deleted because I didn't want two with the same question but I will not do that again.

According to your advise, when I type: 1. 14000000 - 14 peaks 2. 12631379 (number of nucleotides in the reference file) - 14 peaks. 3. 140000000 - 671 peaks

MACS3 needs at least 100 paired peaks at + and - strand to build the model. Why there are more peaks if the g parameter is much bigger (wrong)? Besides, in IGV I can see that there are many more peaks than 14.

ADD REPLY
1
Entering edit mode

The larger the genome the smaller the change that reads accumulate randomly at a given location, hence p-values for a true peak would be smaller the larger the -g parameter is, based on my understanding of the macs algorithm.

ADD REPLY
0
Entering edit mode

You can calculate mappability (effective genome size) by using these directions. You will need to relate it to the length of reads you have.

ADD REPLY
0
Entering edit mode

I deleted the comments by mistake and I don't know how to restore them

ADD REPLY
1
Entering edit mode

Should be restored now. Once a parent comment is deleted all child comments become invisible. So be careful with that.

ADD REPLY

Login before adding your answer.

Traffic: 1663 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6