I'm trying to understand how to fetch recently published documents appearing in Pubmed. It's not as straightforward as you would think.
Mostly I'm confused by how PDAT works. In the XML, PubDate only has month and year (no day) for 90+% of the results, yet a search like
"2012/10/11"[PDAT] : "2012/10/31"[PDAT]
gives 39178 results while
"2012/10/19"[PDAT] : "2012/10/31"[PDAT]
gives 21679 results
There's no Day field in the Article Element of Pubmed XML for most articles. See here: qplot of PubDate/Day extracted from XML
If there is no day field in PubDate for the vast majority of journals, how come these searches differ so much? I would expect that a search for papers published between October 29th and October 31st would also show papers pubished in October and not including a publication day, which seems to be the case, so if the day field is only there for 10% of articles, the number of results for any date range search in October should only differ by a max of about 10%, right?
I could use created date, but created dates range from June to now for papers published this month, presumably because some publishers send in data well before the print comes out, so that's not really what I want.
Looks like [PDAT] references the
ArticleDate
XML nodeThanks, where did you find this information?
Just by browsing the XML of the query results; there are a few different date fields for each article but that one seems to be the only one consistently fitting within the range specified by PDAT — I could be mistaken though.
hmmm. maybe so. Such a bewildering array of dates...
This question continues to trouble me. Here is my latest examination of PDAT.
It does not map to ArticleDate. My best guess is that using DocSum XML output format, it maps to
Item[@Name='PubDate']
or if EPubDate exists and is earlier,Item[@Name='EPubDate']
.