Entering edit mode
5.3 years ago
tastafor
▴
10
I am looking for the refseq protein sequences (protein.faa.gz files) of all animals. I don't understand the difference between the two -
ftp://ftp.ncbi.nih.gov/refseq/release/
ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/
The refseq/release has files labelled as vertebrate_mammalian_1.protein.faa.gz etc, while genomes/refseq has separate files for each species with .protein.faa.gz file for each species.
What is the difference? Which one is better if I need all the animal refseq protein sequences?
RefSeq sequences are for organisms that may or may not have completed genome. Refseq under the genomes section is for complete genomes.
So which one is more comprehensive, the refseq sequences or refseq under genome sequences? Thank you
RefSeq under the genomes is a snapshot of all of the proteins that were included in the annotation. If an organism is being actively curated and a bunch of new RefSeqs get added, they will not be included in the annotation files in the FTP genomes path until a new annotation release is made. RefSeq releases occur independently of the annotation releases so all new RefSeqs get included in the FTP refseq releases path.