Hello everyone,
I need to download the refseq files for viral genomes from the ncbi database. I found the ftp download link (ftp://ftp.ncbi.nih.gov/refseq/release/viral/) with the files listed below. I've tried to find out what each file is, but I can't find anywhere the meaning of the numbers. What is the difference between viral.1.1.genomic.fna.gz and viral.2.1.genomic.fna.gz? They all seem to be uploaded on the same date, so they can't be different versions. I tried their README (ftp://ftp.ncbi.nih.gov/refseq/release/release-catalog/README) but I don't see the information I need.
List of files:
- viral.1.1.genomic.fna.gz
- viral.1.genomic.gbff.gz
- viral.1.protein.faa.gz
- viral.1.protein.gpff.gz
- viral.2.1.genomic.fna.gz
- viral.2.genomic.gbff.gz
- viral.2.protein.faa.gz
- viral.2.protein.gpff.gz
- viral.nonredundant_protein.1.protein.faa.gz
- viral.nonredundant_protein.1.protein.gpff.gz
Can anyone tell me what the difference is or where to find this information? Thanks a lot!
Thank you very much for your answer, I had the same question.
So you explained the difference between the files viral.1.protein.faa and viral.2.protein.faa (they are the viral fasta protein database divided into two files - same goes for DNA and genbank).
What is the difference between them and the file viral.nonredundant_protein.1.protein.faa?
isn't RefSeq already a non-redundant database?
From the release notes: