I want to be able to automatically download only the representative assemblies for a specific taxon (necessary for many clinically relevent taxons, as there are often hundreds to thousands of assemblies for the same species).
I thought it would be obvious to use the ncbi e-utils for these. So the first step would be to get a list of accession numbers of all assemblies that match my query (chosen taxon + labelled as "representative genome"). I used the following command for that:
esearch -db assembly -query '((txid662[Organism])) AND "representative genome"[RefSeq Category]'|
However this only gives me this overview, with which i can't really do anything:
<ENTREZ_DIRECT>
<Db>assembly</Db>
<WebEnv>NCID_1_30175195_130.14.18.48_9001_1584976543_326121091_0MetA0_S_MegaStore</WebEnv>
<QueryKey>1</QueryKey>
<Count>111</Count>
<Step>1</Step>
</ENTREZ_DIRECT>
I know the query works, as I expect exactly 111 representative genomes for this taxon, but how can i get to the actual list of accession numbers?
In order to try some troubleshooting, I tried going through the examples in the tutorial of the corresponding web-based esearch API and modified the protein example (that is supposed to give me list of protein accession numbers with a certain molecular weight) for the UNIX CLI version:
esearch -db protein -query '70000:90000[molecular weight]'
But here I also just get a rather useless overview:
<ENTREZ_DIRECT> <Db>protein</Db>
<WebEnv>NCID_1_30143830_130.14.22.76_9001_1584976742_1372775979_0MetA0_S_MegaStore</WebEnv>
<QueryKey>1</QueryKey> <Count>35421275</Count> <Step>1</Step>
</ENTREZ_DIRECT>
The available command line options seem to be very limited, and I can find none that would give me something else than an overview. How do I get to a list of accessions using esearch on the command line?
esearch must be followed by efetch to download the data.