Possibly a stupid question but I'm not getting the same results for the same
query on the NCBI protein database versus with gene database. I get a larger
set for the query on the protein DB compared with the same query on
the gene DB. I assume that each protein result must have a corresponding
gene entry in the database? Any idea how to get the nucleotide sequences for
the genes using my query? Here is the query string:
gyrB[Gene] OR (DNA gyrase subunit B[Protein]) AND Pseudomonas[Primary Organism]
NOT partial
First, you cannot use the same query on both databases, because they use different terms. PROT (Protein) and PORG (Primary Organism) are specific to the Protein database. The equivalent terms for the Gene database might be TITL (Gene/Protein Name) or PFN (Protein Full Name) and ORGN (Organism). See this list of Entrez databases and their terms.
If you run the Protein query (145 results), then look on the right side of the page for "Find related data", choose "Gene" and "Find items", 33 results are returned. This is almost the same number as for your Gene query (34). So there is some kind of mapping between the two.
I would not necessarily expect each protein to have a corresponding gene, or vice-versa; it all depends on how each database is curated and maintained. Probably best to read up on the documentation for each of the databases, to see if there's any mention of potential causes for discrepancy.
Thanks Neil. I resorted to writing a script to parse the gene out of the protein genbank file then fetch that from the database.
Nice response! Where'd you get that (excellent) list? Is it updated regularly? Who maintains it? Where does the underlying data come from?