Effective genome size of UCSC hg38
2
0
Entering edit mode
8.7 years ago

Hi there!

I am running macs2 for peak calling of ChIP-seq data, but I don't know the effective genome size of UCSC hg38. Is anyone got a hint? The default value of macs2 is for hg18.

Thanks!

ChIP-Seq • 8.6k views
ADD COMMENT
0
Entering edit mode

I'd just use the hg18 value though. I made my script for corn genomes etc, where the EGS is not known.

ADD REPLY
1
Entering edit mode
7.7 years ago
Bontus ▴ 80

Have a look at this page for some stats of different genomes

Spoiler for hg38

  • total size - 3,209,286,105
  • non-N bases - 3,049,315,783 (= mappable / effective size)
ADD COMMENT
0
Entering edit mode
8.7 years ago
endrebak ▴ 980

Script to compute the effective genome size: epic-effective

Note that I get slightly different results than some papers, but no one on biostars seem to think my method for computing the egs is wrong.

ADD COMMENT
0
Entering edit mode

I tried to run this. And I came into problems that the code seems not recognize the hg38 fasta genome file from UCSC, because the chromosome name contain "_".

Can you kindly run it for me? My readlength is 101 and I used the reference human genome hg38 from UCSC.

ADD REPLY
0
Entering edit mode

But you probably do not want to include those chromosomes since they will affect the computation. I would have the same problem. You should remove those "bad" chromosomes, I will try to make the script do it automatically, but do not have time now.

ADD REPLY

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6