Question

Text-Mining Clinicaltrials.Gov For Drug-Gene Interactions

5

Entering edit mode

13.1 years ago

Obi Griffith 20k

I have little direct experience with text-mining tools. Can anyone suggest a good tool or approach for text-mining drug-gene relationships from clinical trials available at clinicaltrials.gov? They provide xml files for each clinical trial record but unfortunately gene information is not a standard field (but often mentioned in free-form descriptive fields). I would have a list of genes and a list of drugs and want to know when they co-occur in a clinical trials record. However, it would be nice to get more than just simple co-occurrence. Is anyone aware of a tool that could rank co-occurrences in some reasonable way based on term incidence, proximity, natural language processing concepts, etc. Here is an example record to give some context.

drug gene • 3.9k views

ADD COMMENT • link updated 13.1 years ago by Mary 11k • written 13.1 years ago by Obi Griffith 20k

Ram · Answer 1 · 2012-07-04

2

Entering edit mode

13.1 years ago

Arun 2.4k

There was a recent blog post from homolog.us here: It mentions some of the best resources available for text mining. Does this help at all, at least to get you started?

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 13.1 years ago by Arun 2.4k

score 1 · Answer 2 · 2012-07-04

What you are trying reminds me of the XplorMed tool. http://www.ogic.ca/projects/xplormed/

It used to have more features, but it might still work for you with your input data. It used to be able to start with a keyword, ID, or PubMed query and look for co-occurrence of terms, with ranking. Currently it asks for abstracts but you might be able to fake it out with the Clinical Trials xml records instead. At least it's worth a try.

It would probably help to read their publications about how they did it even if it doesn't work, and you might be able to get the software and tweak it yourself if the abstract trick doesn't work.