Question

How to cluster drugs by up/down gene lists?

0

Entering edit mode

6.1 years ago

Antony ▴ 10

I'm hoping for some help on drug clustering by gene list. The data I have has not come through any particular workflow, it's all held in text files.

I have a 100 drugs and per drug I have a list up/down genes. Whether a gene is up/down regulated is purely by its presence in a text file (as two columns, "up" and "down"). There is no numerical expression data at all.

I have very little clustering experience. I was hoping to learn whether it would be possible cluster among the 100 drugs by their up/down gene lists similarity. As for a clustering cut-off, I'm not entirely sure and I am happy for that to be exploratory for the time being providing it finds some separation.

Anything available in R would be very helpful.

Thanks

gene drug clustering • 2.1k views

ADD COMMENT • link updated 6.1 years ago by Jean-Karim Heriche 27k • written 6.1 years ago by Antony ▴ 10

2

Entering edit mode

It may be worth exploring the literature for some practical examples, if you have not done so already. Some random examples are below:

On the drug front, have a look at the Open Targets Platform and its batch search. It may useful to try it out for your list of genes and find which diseases are associated with your genes, any pathway (and GO) enrichment set in that list, and an overview of possible protein interactions among those.

ADD REPLY • link 6.1 years ago by Denise CS ★ 5.2k

0

Entering edit mode

Thanks for all of the information. I work down to the wire, but when I've looked into this wealth of information I'll reply.

ADD REPLY • link 6.1 years ago by Antony ▴ 10

2

Entering edit mode

Another idea may be to just perform a simple gene enrichment and then plot them, as I do here: Clustering of DAVID gene enrichment results from gene expression studies

Not 100% what you want, though.

ADD REPLY • link 6.1 years ago by Kevin Blighe 89k

score 2 · Answer 1 · 2019-03-19

2

Entering edit mode

6.1 years ago

Jean-Karim Heriche 27k

Represent each drug by a binary vector of genes where 1 is up and 0 is down then use a measure of similarity appropriate for binary vectors (for a selection, check the R package proxy). Start with hierarchical clustering with complete linkage to get an idea of the structure of the data. If there is any strong clustering structure, you should see it there.

ADD COMMENT • link 6.1 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks, Jean-Karim. I'll give it a try. Just an added thought, I'm struggling to visualise how this is going to work if the genes expressed between drugs are different for example, just looking at up genes, how would two vectors be comparable if in the same element position they had a "1" but for two completely different genes? What I do not have is the complete space of all genes, just those that are up/down regulated in each drug.

ADD REPLY • link 6.1 years ago by Antony ▴ 10

1

Entering edit mode

You could either decide to only use the genes that are common to all the drugs or treat them as missing values or a combination of both. For example, genes that are missing in a large fraction of the drugs (e.g. 60-70%) could be dropped and missing information could be treated as a third category (e.g. 1: up, -1, down and 0: missing, this is a form of imputation). In the later case, the data is not binary anymore and you would need to look for suitable measure of similarity/distance.

ADD REPLY • link 6.1 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I now have the complete gene list and I am taking the approach of +1 (up), 0 (missing), -1 (down). When I find a means of calculating a similarity/distance, I'll edit my post. Thanks.

ADD REPLY • link 6.1 years ago by Antony ▴ 10