In light of the recent 10 year anniversaries of PubMed Central and the PLoS Open Letter, I've been thinking about whether the promise of Open Access (OA) publishing has actually paid off in terms of high-throughput text-mining on the entire OA corpus. Given Peter Binfield's predictions that by 2016 ~50% of the literature could be published in OA journals, the ability to turn this promise into a reality is not far off. Nevertheless, I find it hard to cite examples where text-mining has actually been applied to the entire PMC corpus, which is the basis of my question:
Can you suggest papers that use full-text mining on all of Open Access subset of PubMed Central?
We've done some work in this are and I am aware of some relevant papers, e.g.:
- Annotating genes and genomes with DNA sequences extracted from biomedical articles. http://www.ncbi.nlm.nih.gov/pubmed/21325301
- Figure text extraction in biomedical literature. http://www.ncbi.nlm.nih.gov/pubmed/21249186
- LINNAEUS: a species name identification system for biomedical literature. http://www.ncbi.nlm.nih.gov/pubmed/20149233
- Systematic Characterizations of Text Similarity in Full Text Biomedical Publications http://www.ncbi.nlm.nih.gov/pubmed/20856807
- Yale Image Finder (YIF): a new search engine for retrieving biomedical images. http://www.ncbi.nlm.nih.gov/pubmed/18614584
- BioLit: integrating biological literature with databases. http://www.ncbi.nlm.nih.gov/pubmed/18515836
- Figure mining for biomedical research. http://www.ncbi.nlm.nih.gov/pubmed/19439564
Other suggestions are most welcome to fill gaps in my knowledge in this area.
I'll select the accepted answer for providing the most additional examples, or for anyone who can find an example (not listed above) that performs full-text mining on PMC OA supplemental files.
EDIT 23 SEP 2011
I've also received a few responses to this question by cross-posting on the bioNLP mailing list:
- Author keywords in biomedical journal articles. http://www.ncbi.nlm.nih.gov/pubmed/21347036 (credit to Aurélie Névéol)
- UKPMC: a full text article resource for the life sciences. http://www.ncbi.nlm.nih.gov/pubmed/21062818 (credit to C.J. Rupp)
EDIT 5 OCT 2011
One more from the conference proceedings literature:
- An exploration of mining gene expression mentions and their anatomical locations from biomedical text http://dl.acm.org/citation.cfm?id=1869970 (credit to Martin Gerner)
EDIT 30 OCT 2011
- Extraction of data deposition statements from the literature: a method for automatically tracking research results. http://www.ncbi.nlm.nih.gov/pubmed/21998156 (credit to Google)
- BioNOT: A searchable database of biomedical negated sentences. http://www.ncbi.nlm.nih.gov/pubmed/22032181 (credit to BMC TOC)
EDIT 7 NOV 2011
- Intrinsic evaluation of text mining tools may not predict performance on realistic tasks. http://www.ncbi.nlm.nih.gov/pubmed/18229722 (credit to @maximilianh)
hmm, this paper is pre-dating PMC OA: http://www.biomedcentral.com/1471-2105/4/20
@Michael - yes there are a number of papers like this that use a selection of full-text articles from journal X, but not the entirety of PMC OA.