Hi, I am trying to download all coronavirus complete genomes in only animals and its metadata (including countries, releasing date, accession number etc) from NCBI. Please suggest any methods or scripts to download it.
Hi, I am trying to download all coronavirus complete genomes in only animals and its metadata (including countries, releasing date, accession number etc) from NCBI. Please suggest any methods or scripts to download it.
should generate a table of non-human corona virus sequences at NCBI. Adjust filters, metadata as needed.
Hi,
you can use NCBI datasets. The default data package includes the genomic sequences in FASTA format, and a metadata file in JSON-Lines. Here's how to download all complete coronavirus (I'm assuming SARS-COV-2 here) using the command line tool.
datasets download virus genome taxon sars-cov-2 --complete-only
The data report included has all this information you want. You can use dataformat
, which is a datasets
companion tool.
Please let me know if this is what you're looking for, or feel free to ask additional questions :) Thanks! Mirian
MirianT_NCBI OP wants non-human corona virus sequences
. Your solution above seems to download all sars-cov-2 genomes, is that correct?
You are correct GenoMax . I completely misinterpreted the question (animals = Metazoa). In that case, OP could download a list of accessions from the NCBI Virus website and use as input to the command datasets download virus genome accession --inputfile sequences.acc
, if that makes sense for them (if they want to download everything directly to a server for example).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you for your reply.