How do I use entrez fetch queries on the command line to download an entire RefSeq genome?
Thank you
How do I use entrez fetch queries on the command line to download an entire RefSeq genome?
Thank you
You'll have to know the accession number(s) for the sequences in the genome. The sequences are most easily accessible from the nucleotide database.
For example "Aeromonas hydrophila" genome sequence has accession CP007518.2
to retrieve it in genbank format type into the terminal the following:
efetch -db nucleotide -id CP007518.2 -mode text -format gb
Replace the CP007518.2
with accession for the sequence that you want.
The format may be i.e. fasta
instead of gb
for genbank.
It's the same, get the accessions for the sequences in the genome, pass them, comma delimited to the -id
as -id "CP007518.2,CP007518.1"
.
There are other options, but they involve more entrez direct tools then efetch. For more complex stuff read: https://www.ncbi.nlm.nih.gov/books/NBK179288/.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
A prior answer may work better with Eukaryotic genomes: C: Retrieve genome in fasta format from ncbi
For example: