I have little direct experience with text-mining tools. Can anyone suggest a good tool or approach for text-mining drug-gene relationships from clinical trials available at clinicaltrials.gov? They provide xml files for each clinical trial record but unfortunately gene information is not a standard field (but often mentioned in free-form descriptive fields). I would have a list of genes and a list of drugs and want to know when they co-occur in a clinical trials record. However, it would be nice to get more than just simple co-occurrence. Is anyone aware of a tool that could rank co-occurrences in some reasonable way based on term incidence, proximity, natural language processing concepts, etc. Here is an example record to give some context.