NCBI esearch results inconsistent with web-based search results?
2
0
Entering edit mode
24 months ago
Dunois ★ 2.8k

I'm trying to collect (and then download) all transcriptomes and genomes associated with txid7604.

The query I am using is:

((txid7604[Organism:exp]) AND ( "tsa master"[Properties] OR "wgs master"[Properties] ))

And the database I am looking at is nuccore.

When I search via the web interface I get four matches: ncbinuccore_web

However, when I query via esearch 16.2 (installed via bioconda), I get 81468 matches?

> esearch -db nuccore -query "((txid7604[Organism:exp]) AND ("tsa master"[Properties] OR "wgs master"[Properties]))"
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_638e1b0b374c2370e333fddb</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>81468</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>

What is the source of this discrepancy?

NCBI eutils • 880 views
ADD COMMENT
1
Entering edit mode
24 months ago
Michael 55k

This is a quoting issue, try to replace the outer double quotes with single quotes.

esearch -db nuccore -query '((txid7604[Organism:exp]) AND ("tsa master"[Properties] OR "wgs master"[Properties]))'
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_638e1db1b926d65727335953</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>4</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>

In principle, you should also have gotten a warning like this for your original query:

esearch -db nuccore -query "((txid7604[Organism:exp]) AND ("tsa master"[Properties] OR "wgs master"[Properties]))"
Entrez Direct does not support positional arguments.
Please remember to quote parameter values containing
whitespace or shell metacharacters.
ADD COMMENT
1
Entering edit mode

I see. I actually managed to find out how to circumvent this.

esearch (and I suppose all its siblings) make a distinction between tsa-master and tsa master and wgs-master and wgs master with double quotes. Adding the hyphen resolves the discrepancy.

No hyphenation -> incorrect count:

> esearch -db nuccore -query "((txid7604[Organism:exp]) AND ("tsa master"[Properties] OR "wgs master"[Properties]))"
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_638e1ebf93dc63259250e57f</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>81468</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>

With hyphenation, correct result:

> esearch -db nuccore -query "((txid7604[Organism:exp]) AND ("tsa-master"[Properties] OR "wgs-master"[Properties]))"
<ENTREZ_DIRECT>
  <Db>nuccore</Db>
  <WebEnv>MCID_638e1ec7b18a76276123d0dc</WebEnv>
  <QueryKey>1</QueryKey>
  <Count>4</Count>
  <Step>1</Step>
</ENTREZ_DIRECT>
ADD REPLY
1
Entering edit mode
24 months ago
barslmn ★ 2.3k
 esearch -db nuccore -query '((txid7604[Organism:exp]) AND ("tsa master"[Properties] OR "wgs master"[Properties]))'

Quotes matter, shell is going to split your command as separate parameters if not quoted properly. Try wrapping your query with single quotes.

https://colab.research.google.com/drive/15xJPdwsNqeU3u-3h-kmNTR2XEaeVggzM?usp=sharing

ADD COMMENT

Login before adding your answer.

Traffic: 2018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6