extract EC number from entrez esearch query
2
0
Entering edit mode
6.0 years ago
bioguy ▴ 50

Anyone have any idea how to use NCBI's entrez command line (https://www.ncbi.nlm.nih.gov/books/NBK25501/) to extract feature information about a specific protein query? Specifically, I need to find ECid's for given queries.

For example, if I want to programmatically access the ECID for the following protein (Citrate Synthase, EC_number=2.3.3.16), how do I do so?:

https://www.ncbi.nlm.nih.gov/protein/RRJ88579.1

I've need to do this for a large number of proteins, but for now just getting it for one would be great...I've been using queries like "esearch -db protein -query 'RRJ88579.1' | efetch -format docsum," but this does not return the EC number.

entrez ncbi protein genomics ECID • 1.5k views
ADD COMMENT
2
Entering edit mode
6.0 years ago

xmllint+xpath

$  wget -O - -q "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=RRJ88579.1&retmode=xml&rettype=gb"   |\
xmllint --xpath '//GBQualifier[GBQualifier_name="EC_number"]/GBQualifier_value/text()' -

2.3.3.16
ADD COMMENT
2
Entering edit mode
6.0 years ago
bioguy ▴ 50

Excellent, thank you.

Alternative method I just found:

esearch -db 'protein' -query 'RRJ88579.1' | efetch -format gpc | xtract -insd Protein EC_number

ADD COMMENT

Login before adding your answer.

Traffic: 1919 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6