How to check if a list of protein ID's are present in NCBI protein database?
1
0
Entering edit mode
3.2 years ago
mrj ▴ 180

I have a list of protein IDs. An example list is as follows. How can I use esearch esearch -db protein -query <listids> in protein database to identify those that are not present in the protein database?

ADK95960.1
PQL24628.1
AVM48340.1
ADK95748.1
EFX40722.1
AKU69507.1
AVM47886.1
ADK96247.1
NOT9000.1
esearch protein database NCBI • 1.0k views
ADD COMMENT
2
Entering edit mode
3.2 years ago

You can do it with esummary:

If your accesions are in file called ids then the following will fetch the json format for each:

URL='https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=protein\&format=json\&id='

cat ids | parallel -j 1 curl -s  ${URL}{} > results.txt

where the results.txt file will contain lines such as:

{"header":{"type":"esummary","version":"0.3"},"error":"Invalid uid NOT9000.1 at position=0","result":{"uids":[]}}

for the invalid entries.

ADD COMMENT
0
Entering edit mode

Thank you. It worked

ADD REPLY

Login before adding your answer.

Traffic: 1980 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6