I am getting some genomes from the ncbi ftp site, one of the genomes(Mus musculus) is
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/635/GCF_000001635.25_GRCm38.p5/GCF_000001635.25_GRCm38.p5_genomic.fna.gz
I'm wondering what the GCF/000/001/635
in the path name means. What do GCF, 000, 001 and 635 mean and why are only certain organisms within some of the folders?
I've noticed only certain organisms have their genomes within certain folders, for example Mus spretus is in
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/001/624/865/GCA_001624865.1_SPRET_EiJ_v1/GCA_001624865.1_SPRET_EiJ_v1_genomic.fna.gz
(Under GCA)
And Meleagris gallapavo is in
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/605/GCF_000146605.2_Turkey_5.0/GCF_000146605.2_Turkey_5.0_genomic.fna.gz
(Still in GCF but within the folder genomes/all/GCF/000/146 instead of genomes/all/GCF/000/0001
Are you getting the paths from the assembly summary files that are in this folder. It would be best to parse the paths out of that file instead of trying to understand the directory organization.