I have a file containing the list of nucleotide sequence accession numbers and I need to get the corresponding assembly accession numbers using Entrez command line (eUtility). For example when I put "CP020622.1" in the search box of the main NCBI web page and choose "assembly" from the menu, it gives me the assembly accession ID: "GCA_002234695.1" (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_002234695.1/). I want to run an eUtility command to output the genome assembly accessions for any input accession ID that I provide (for example I have a chromosome accession ID and I want to retrieve the corresponding genome assembly accession ID). Does anybody know the command that I can use for this purpose?
Thanks a lot for your helpful comments and sorry for my delayed response. I could successfully get assembly accession IDs for a number of species and saved them in a text file (one accession per line. Now I want to download the genomes in a compressed fasta format (genome.fa.gz) using the eDirect command line utility. Could you please help me for that as well?
Cheers, Mani
EntrezDirect is not meant to download genome sequences on the command line. You should use NCBI
datasets
tool for that purpose. Here are relevant details: Download many NCBI genomes with list of GCA identifiersThanks for that.
I created a file (database_genome_list.txt) containing assembly accession for a two genomes. for example:
cat database_genome_list.txt
GCA_002234695.1 GCA_002234715.1
when I run datasets command to download them in parallel:
parallel --bar --jobs 2 -a database_genome_list.txt-2 'datasets download genome accession {}'
I get this error:
0% 0:2=0s GCA_002234715.1 Collecting 1 genome record [================================================] 100% 1/1 Downloading: ncbi_dataset.zip 218MB valid zip structure -- files not checked Validating package [================================================] 100% 4/4 50% 1:1=6s GCA_002234715.1 Collecting 1 genome record [================================================] 100% 1/1 Downloading: ncbi_dataset.zip 237MB 5.02MB/s Error: Internal error (invalid zip archive). Please try again
Use datasets download genome accession <command> --help for detailed help about a command.
Do you know how I can download more than one assembly with one command?
Cheers, Mani
Sorry, I could figure it out and resolve the issue. Thanks for your help