PubMed ID or DOI? And which is easier to extract metadata from given that identifier?
PubMed ID or DOI? And which is easier to extract metadata from given that identifier?
why choose when there are pmid2doi look-up services:
One obvious answer is to use the DOI so that you can extract from more than PubMed entries.
Neither is perfect by any means.
Let's take the example of one of my papers An integrated dataset for in silico drug discovery.
Assuming we have the 2 identifiers:
And we want the full text and metadata for the article.
Full text first, let's use the PMID, and visit the PubMed entry for the paper, here: http://pubmed.org/20375448.
Lo and behold, there is no link through to the article at all. We've hit a dead end.
So let's try the DOI: http://dx.doi.org/10.2390/biecoll-jib-2010-116.
Hurrah, at least there is a link to the full text, even if the DOI doesn't take us through to the actual article itself, but still... success!
OK, now metadata. The DOI gave us success with the article, so let's try that first. Search the CrossRef metadata here: http://api.labs.crossref.org/search?q=10.2390%2Fbiecoll-jib-2010-116.
Hmmm, nothing about my paper at all in that list (this is because the CrossRef database is curiously free of information about that paper, despite the DOI resolving to the paper).
So, back to the PMID, and etuils, here: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=20375448.
Yay, that looks like suitable metadata (even if it is in a completely different format to that we would get back from CrossRef, so any parser we write to consume CrossRef metadata will break). But why could we not retrieve this with the DOI?
I'm not for a second suggesting that the situation is this poor for every article (mostly it depends on the publisher and what they submit to the various databases), but the proportion of articles like this is significant enough for it to be problematic.
Publishers should be providing good enough metadata on the articles themselves for this ad hoc system of arbitrary third party identifiers to be completely unnecessary.
I may be wrong, but I think that the reason that there's no metadata for your article in CrossRef is that the journal, "Journal of Integrative Bioinformatics", didn't submit that for you. It's the same reason the CrossRef URL is resolving to your institutional repository, instead of the journal page, http://journal.imbio.de/article.php?aid=116.
Or the http://www.pmid2doi.org/
Based on the reply from Crossref (see above/below) it looks like this is the same project as http://labs.crossref.org/styled-3/pmid2doi.html
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
that looks good, but it seems like the database is not updated past 2010
Thank you, Casey. I knew you'd have a good solution to this.
I asked CrossRef about updating this service and their reply was "@CrossRefNews: @caseybergman Actually this service appears to be brought to you by the Netherlands Bioinformatics Service." https://twitter.com/#!/CrossRefNews/status/157098241660436480