Hi, I was trying to download SARS-CoV-2 sequences data from NCBI following this link: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049 When I click the empty box, I can only get like 200 sequences, each time. So I was wondering, is there a way to batch download all the genome sequences data with a click? Many thanks. I thought I did this earlier, but I do not quite recall.
You can get the assembly ids, and download from the ftp, for example:
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/858/895/GCA_009858895.3_ASM985889v3/
Many thanks for your kind reply. Could you be a bit more specific then? Many thanks.
I clicked on the link you posted, clicked on the tab for
Refseq Genome
, clicked on the assembly:https://www.ncbi.nlm.nih.gov/assembly/GCF_009858895.2
Then clicked on
FTP directory for GenBank assembly
You can get the fasta sequence by clicking on
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/858/895/GCA_009858895.3_ASM985889v3/GCA_009858895.3_ASM985889v3_genomic.fna.gz
And gene informations (gff format):
https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/009/858/895/GCA_009858895.3_ASM985889v3/GCA_009858895.3_ASM985889v3_genomic.gff.gz
I have downloaded those sequences, as you mentioned in march 2021; currently I am trying to download them again though I have faced errors and the download has failed any time I tried. I checked the NCBI command line, ENTREZ and viral datasets too, Do you have any other solution or Do you know any other available resource for SARS CoV 2 nucleotide and amino acid sequences?
I am downloading it now using
datasets download genome taxon sars-cov-2 --filename virus.zip
without any issues. There are close to 340,000 genomes for SARS as of today.Edit: The final file was 8.8 G.