I am calculating the total genome length from a fasta file using the following code
zcat genome.fa.gz | grep -v ">" | wc | awk '{print $3-$1}'
For Yeast, I get 12,157,105
, and the Ensembl info indicates exactly 12,157,105
. So, that adds up.
For Human, I get 56,917,651,860
, but the Ensembl info indicates 3,609,003,417
.
Anyone know why? I must be missing something.
From the ftp://ftp.ensembl.org/pub/release-94/fasta/homo_sapiens/dna/README:
Ah right! Ensembl count for human is based on the primary assembly. And some organisms don't have primary assemblies, just the top-level.