Identical Queries On Ncbi Gene And Protein Databases Returns Fewer Results From The Gene Database.
2
0
Entering edit mode
12.3 years ago
Michael Barton ★ 1.9k

Possibly a stupid question but I'm not getting the same results for the same query on the NCBI protein database versus with gene database. I get a larger set for the query on the protein DB compared with the same query on the gene DB. I assume that each protein result must have a corresponding gene entry in the database? Any idea how to get the nucleotide sequences for the genes using my query? Here is the query string:

gyrB[Gene] OR (DNA gyrase subunit B[Protein]) AND Pseudomonas[Primary Organism] NOT partial

ncbi database search • 3.5k views
ADD COMMENT
3
Entering edit mode
12.3 years ago
Neilfws 49k

First, you cannot use the same query on both databases, because they use different terms. PROT (Protein) and PORG (Primary Organism) are specific to the Protein database. The equivalent terms for the Gene database might be TITL (Gene/Protein Name) or PFN (Protein Full Name) and ORGN (Organism). See this list of Entrez databases and their terms.

If you run the Protein query (145 results), then look on the right side of the page for "Find related data", choose "Gene" and "Find items", 33 results are returned. This is almost the same number as for your Gene query (34). So there is some kind of mapping between the two.

I would not necessarily expect each protein to have a corresponding gene, or vice-versa; it all depends on how each database is curated and maintained. Probably best to read up on the documentation for each of the databases, to see if there's any mention of potential causes for discrepancy.

ADD COMMENT
0
Entering edit mode

Thanks Neil. I resorted to writing a script to parse the gene out of the protein genbank file then fetch that from the database.

ADD REPLY
0
Entering edit mode

Nice response! Where'd you get that (excellent) list? Is it updated regularly? Who maintains it? Where does the underlying data come from?

ADD REPLY
0
Entering edit mode
12.3 years ago
Will 4.6k

Probably because there are multiple protein entries for each gene .... ie. alternate splicing variants.

ADD COMMENT
0
Entering edit mode

Many species returned in the protein set are not present in gene set.

ADD REPLY

Login before adding your answer.

Traffic: 1675 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6