Searching For Organism Mentions In Pubmed
4
4
Entering edit mode
12.7 years ago
Will 4.6k

As a proxy for a database of host-pathogen pairs mentioned I asked about here I was thinking about using organism co-mentions in Pubmed articles. I can easily make a script that retrieves all of the PMIDS that result from a search for every organism name in NCBI-phylogeny.

However, I was wondering if anyone knew of a more 'elegant' way of doing this. Does anyone know of an NLP tool for searching organism mentions in Pubmed ... or perhaps a database which annotates these?

Thanks.

• 3.4k views
ADD COMMENT
3
Entering edit mode
12.7 years ago
Nathan Harmston ★ 1.1k

So one way of doing this might be to download all of MEDLINE and then try running a species tagger on the abstracts.

Two recent species NER taggers

Both of these systems identify species mentions in text and then map then to NCBI taxonomy identifiers. Each of these systems has their own issues and biases, and basically apply regular expression matching over the text (OrganismTagger uses a SVM to identify strain information). While performance is decent, you are obviously going to be a lot of false positive and false negatives. HTH

ADD COMMENT
3
Entering edit mode
12.7 years ago
Hamish ★ 3.3k

Not sure if they cover exactly what your are looking for, but the Rebholz Group at EMBL-EBI provide a number of services which may be relevant:

  • Whatizit: a text processing system which features a range of pipelines (see http://www.ebi.ac.uk/webservices/whatizit/info.jsf ) for identifying a wide range of biologically relevant terms. In your case the "whatizitOrganisms", "whatizitOrganismsFilter" and "whatizitUkPmcSpecies" pipelines which mark-up NCBI Taxonomy related terms are likely to be of interest.
  • EBIMed: a text-mining aware search engine for the MEDLINE data. Whatizit provides annotations for use in EBIMed, including the "whatizitOrganisms" pipeline results.

Text-mining results from Whatizit are also integrated into CiteXplore and UK PubMed Central (UKPMC). You may want to look at CiteXplore and UKPMC for additional coverage of the literature, since these contain more than just MEDLINE/PubMed, see http://www.ebi.ac.uk/citexplore/showStatistics.do and http://ukpmc.ac.uk/FAQ#searchon for details of the additional sources.

ADD COMMENT
0
Entering edit mode

You'd think after using Whatizit daily for the past ~7 months I would know that it had an organism tagger! But I guess that's what happens when your 'work blinders' are on ;)

ADD REPLY
0
Entering edit mode

I would not recommend using Whatizit for organism tagging (see issues and performance evaluation here: http://www.biomedcentral.com/1471-2105/11/85) @Nathan's suggestions show much better performance.

ADD REPLY
2
Entering edit mode
12.7 years ago

We have parsed MEDLINE 2011 baseline files for organism mentions with LINNAEUS. These data are available here: http://biocontext.smith.man.ac.uk/data/entities-species.csv.gz

For more information see http://biocontext.org/

ADD COMMENT
1
Entering edit mode
12.7 years ago
boczniak767 ▴ 870

You could also try elise It looks for co-occurences of words/phrases in PubMed

paper here

ADD COMMENT

Login before adding your answer.

Traffic: 1844 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6