I have several hundred disease, as defined by ICD-9-like strings, shown below. Many of these diseases show association with a number of copy number variants--mostly deletions. For these variants, I know the chr/start/end position information and thus if it overlaps a gene and which genes it is overlapping. In the past, we have looked to see if one disease was involved with a number of SNPs or their nearby genes, but now I have many disease/variant combinations I'd like to test. I wonder if anyone has a good pipeline for testing a bunch of chromosomal regions or genes for association with a number of different keywords defined by the ICD-9 strings. For instance
Diseases x gene combinations:
Benign neoplasm of eye PCAT19
Non-melanoma skin cancer MC1R
Malignant neoplasm of ovary PALB2
Breast cancer APOBEC3B
Benign neoplasm of other parts of digestive system RTN3
Squamous cell carcinoma RAD51C
I have used some tools like Chilibot or GLAD4U for individual searches, but can someone think of a way that this could be scaled up to do combination searches on pairs like those above? Even better if it can just include the chr/start/stop without the gene name and be done from the unix prompt.
UPDATE: A partial answer may be covered here and involves making ICD9 to MeSH linkages. One problem with this is linking ICD-9 to MeSH terms does not appear to be very clean. While Gene2MeSH linkages seem simple enough, ICD9 to MeSH linkages are currently the problem.
Thanks, Ryan D