RefSeq classifies each genome into one of the following assembly level categories: Complete Genome, Chromosome, Scaffold, Contig. Because your code downloads only complete genomes, the number of downloaded sequences is smaller than the number provided in the RefSeq statistics. Of note, the majority of RefSeq genomes (~80%) are assembled at the scaffold and contig levels.
For downloading RefSeq genomes I recommend using genome_updater. It is a bash script that allows you to download genomes from RefSeq or GenBank with many filters (e.g., according to different assembly levels or taxonomic units). The script tracks changes (it only downloads updated genomes since your last download), allows multithreading, and it has a file integrity check.
That means: there exist RefSeq sequences which are not contained in the set of RefSeq genomes. This is totally expected.
From the release notes:
2.2 Molecule Types Included
The RefSeq release includes genomic, transcript, and protein sequence
data; however, these molecule types are not provided for all
organisms and the sequences provided may not be complete or
comprehensive for some species.
Transcript RefSeq records may represent protein-coding transcripts or
non-coding RNA products; these records are currently only provided for
eukaryotic species.
Genomic RefSeq records are provided when a sufficient quantity of
genomic sequence data is available in DDBJ/EMBL/GenBank. Transcript
and protein records may be provided for a species before genomic
sequence data is available.
Thank you very much for your answer, Andrzej. I found this solution - ncbi-genome-download Thanks for the link, I'm interested in testing.