Extract PMIDs from a gene or protein ID
3
0
Entering edit mode
9.6 years ago

Given a uniprot ID, I am trying to automatically extract related pubmed IDs (PMIDs) from pubmed. I can map the UniProt ID to something NCBI can understand. For instance, UniProt ID O14733 can be mapped to GI:6831583 and then you can launch a search from http://www.ncbi.nlm.nih.gov/protein/O14733 to see the associated pubmed articles with the URL http://www.ncbi.nlm.nih.gov/pubmed?linkname=protein_pubmed_weighted&from_uid=6831583.

I have never used ncbi's e-utils, so it may be a very simple modification to be able to fetch these articles automatically, but I can't figure it out. My best guess was http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&LinkName=protein_pubmed_weighted&from_uid=6831583, but this returns nothing.

Basically, given an ID such as 683583, I want to return a list of PMIDs. I would rather do this in python if possible. Any suggestions?

pubmed • 4.1k views
ADD COMMENT
1
Entering edit mode
ADD REPLY
2
Entering edit mode
9.6 years ago
David W 4.9k

If you use the link EUtil with dbfrom="protein", db="pubmed" you'll get a list of pmids associated with that protein.

You can then use esummary or efetch on those pmids.

ADD COMMENT
0
Entering edit mode

Thanks, that's what I needed!

ADD REPLY
3
Entering edit mode
9.6 years ago

For completeness, here's what I worked out with David W's help. The construction of the URL is such:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=protein&db=pubmed&id=215274019&linkname=protein_pubmed_weighted

This is how I was able to retrieve the records in Biopython:

protein_ID = "215274019"

handle = Entrez.elink(db="pubmed", dbfrom="protein", id=protein_id, linkname="protein_pubmed_weighted")
record = Entrez.read(handle)

for PMID in record[0]['LinkSetDb'][0]['Link']:
    print PMID['Id']
ADD COMMENT
0
Entering edit mode

Hey, I am trying to do the same thing basically. I have a q. D o you know the differences between the different linknames, i.e., protein_pubmed_weighted, protein_pubmed, and protein_pubmed_refseq?

Best,
Nils

ADD REPLY
1
Entering edit mode
9.5 years ago

Depending on the scope of your project, you might want to directly download NCBI's look-up table between EntrezIDs and PubmedIDs and integrate this table into your workflow.

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz

I like this table because it enables easy (and computationally fast) filtering; e.g.: exclude papers, which cover 100s -1000s of different genes (and usually thus do not reveal gene-specific biology). e.g.: find genes, which are only mentioned together with your genes of interest

ADD COMMENT
0
Entering edit mode

This looks like a great resource, though I don't think I can easily map my UniProt IDs to genes, and I think it might change the coverage of the papers if I did. Still, I'm definitely going to bookmark this folder, thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6