Question

Identical Queries On Ncbi Gene And Protein Databases Returns Fewer Results From The Gene Database.

0

Entering edit mode

12.2 years ago

Michael Barton ★ 1.9k

Possibly a stupid question but I'm not getting the same results for the same query on the NCBI protein database versus with gene database. I get a larger set for the query on the protein DB compared with the same query on the gene DB. I assume that each protein result must have a corresponding gene entry in the database? Any idea how to get the nucleotide sequences for the genes using my query? Here is the query string:

gyrB[Gene] OR (DNA gyrase subunit B[Protein]) AND Pseudomonas[Primary Organism] NOT partial

ncbi database search • 3.5k views

ADD COMMENT • link updated 7.3 years ago by Biostar 20 • written 12.2 years ago by Michael Barton ★ 1.9k

score 3 · Answer 1 · 2012-09-27

3

Entering edit mode

12.2 years ago

Neilfws 49k

First, you cannot use the same query on both databases, because they use different terms. PROT (Protein) and PORG (Primary Organism) are specific to the Protein database. The equivalent terms for the Gene database might be TITL (Gene/Protein Name) or PFN (Protein Full Name) and ORGN (Organism). See this list of Entrez databases and their terms.

If you run the Protein query (145 results), then look on the right side of the page for "Find related data", choose "Gene" and "Find items", 33 results are returned. This is almost the same number as for your Gene query (34). So there is some kind of mapping between the two.

I would not necessarily expect each protein to have a corresponding gene, or vice-versa; it all depends on how each database is curated and maintained. Probably best to read up on the documentation for each of the databases, to see if there's any mention of potential causes for discrepancy.

ADD COMMENT • link 12.2 years ago by Neilfws 49k

0

Entering edit mode

Thanks Neil. I resorted to writing a script to parse the gene out of the protein genbank file then fetch that from the database.

ADD REPLY • link 12.2 years ago by Michael Barton ★ 1.9k

0

Entering edit mode

Nice response! Where'd you get that (excellent) list? Is it updated regularly? Who maintains it? Where does the underlying data come from?

ADD REPLY • link 12.1 years ago by Chris Maloney ▴ 360

score 0 · Answer 2 · 2012-09-27

0

Entering edit mode

12.2 years ago

Will 4.6k

Probably because there are multiple protein entries for each gene .... ie. alternate splicing variants.