Many of the bacteria RefSeq genomes aren't available in NCBI's Gene database, so e-utils with the gene db won't work. If you have a specific set of assemblies in mind, try downloading the "feature_table.txt" files for that set and parsing what you need from there. e.g.:
https://www.ncbi.nlm.nih.gov/assembly/?term=txid1239%5Borgn%5D+latest_refseq%5Bfilter%5D
Then use the "download assemblies" button to download the "Feature table" file for the RefSeq assemblies. All Firmicutes is 35k assemblies and a 4.6GB download.
Your example protein is in this file:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/372/005/GCF_000372005.1_ASM37200v1/GCF_000372005.1_ASM37200v1_feature_table.txt.gz
The genomic location is in columns 7-10, and the gene symbol (if available) is in column 15. You could then use e-utils to get the FASTA sequence for that genomic range.
If you want CDS nucleotide sequence (same as the gene sequence), with gene symbols in the FASTA headers, try the "CDS from genomic" file from that same download option (31.8 GB). Your example has a header like this:
>lcl|NZ_AQYY01000001.1_cds_WP_020487904.1_359 [gene=clpB] [locus_tag=A37G_RS0101875] [protein=ATP-dependent chaperone ClpB] [protein_id=WP_020487904.1] [location=424554..427151] [gbkey=CDS] ATGGACACCGACAAGCTGACGACCCGCAGCCGGGACGCGGTCTCGGCCGCCCTGCGCACCGCTCTGACGAAAGGCAACCC GGCGGCCGAGCCGGTGCACCTGCTGTACGCGTTGCTGCTGGTCCCCGACAACACGGTCGCGCCCCTGCTGGGCTCGATCG
To do that for individual proteins via e-utils, you could use something like:
# first use the IPG report to get the nucleotide accession and location
esearch -db protein -query WP_020487904 | esummary -format ipg | grep WP_020487904
41115784 RefSeq NZ_AQYY01000001.1 424554 427151 + WP_020487904.1 ATP-dependent chaperone ClpB Dehalobacter sp. FTH1 FTH1 GCF_000372005.1
# then use that location from columns 3-6 to get the sequence:
efetch -db nuccore -id NZ_AQYY01000001.1 -seq_start 424554 -seq_stop 427151 -strand plus -format fasta_cds_na
>lcl|NZ_AQYY01000001.1_cds_WP_020487904.1_1 [gene=clpB] [locus_tag=A37G_RS0101875] [protein=ATP-dependent chaperone ClpB] [protein_id=WP_020487904.1] [location=424554..427151] [gbkey=CDS]
ATGGACACCGACAAGCTGACGACCCGCAGCCGGGACGCGGTCTCGGCCGCCCTGCGCACCGCTCTGACGA
Keep in mind a single WP may be found on multiple assemblies (or even at multiple locations of the same assembly), so the IPG report may have multiple rows for the same WP accession.
Note only about 10% of the genes for that assembly have gene symbols assigned. Protein names on WPs are better defined than gene symbols.
Do you need Gene symbol or gene sequences in the fasta format? Do you need this data for txid1239 or txid1000277?
For example, gene symbol info will be included in the gene table can be downloaded using following NCBI Unix eutils command.
I want gene symbols n fasta sequences if input is refseq protein ids for taxon id 1239.
You can get the sequence by doing following:
I know it's primitive question but how to download esearch? It's throwing error : 'No command 'esearch' found' ...
Okay I got it now. One can download edirect suit from here : ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/ . It contains esearch n efetch programs.
Still it's not working @GenoMax. After running this it's showing me 'help' of efetch :
EFETCH - retrieve entries from sequence databases.
Synopsis: efetch -options [database:]<query>
Databases: SWissprot/SP, PIR, WOrmpep/WP, EMbl, GEnbank/GB, ProDom, ProSite
Options: -a Search with Accession number -f Fasta format output -q Sequence only output (one line) -s <#> Start at position # -e <#> Stop at position # -o More options and info...
Environment: SWDIR = SwissProt directory - database and EMBL index files PIRDIR = PIR -- " -- WORMDIR = Wormpep -- " -- EMBLDIR = EMBL -- " -- GBDIR = Genbank -- " -- PRODOMDIR = ProDom -- " -- PROSITEDIR = ProSite -- " -- DBDIR = User's own -- " -- (fasta format)
SEQDB database file (default SwissProt) SEQDBIDX index file DIVTABL division lookup table
Ex. setenv DBDIR /pubseq/seqlibs/embl/
Note that Prodom family consensus seqs can be fetched by PD:_#
by Erik Sonnhammer (esr@sanger.ac.uk) Version 2.1,
I am not sure if you are using the correct version of edirect utils. Download the latest version of the eutils from: ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/versions/current. You can also have a look this blog for more info.