I wish to compare the proteins in each cluster and assign a similarity score based on how each cluster compares to each other. So, for example Cluster 1 to Cluster 1 would have 1 similarity, Cluster 1 to Cluster 2 0.7 similarity and so on and so forth. The number of proteins in each cluster is different, and so the score should be based on each individual clusters total number of proteins. Output should preferably be something like a similarity matrix, so it would look something like this:
Input:
Cluster 1 CSF2,NRAS,GSK3A,GSK3B
Cluster 2 MAP3K7,HLA-DRA,NFKBIA,ZAP70
Cluster 3 CSF2,NRAS,GRIN1,CDKN1A
Cluster 4 GSK3A,GSK3B,NRAS,CSF2
Output:
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Cluster 1 1 0 0.33 1
Cluster 2 0 1 0 0
Cluster 3 0.33 0 1 0.33
Cluster 4 1 0 0.33 1
Any help or advice would be greatly appreciated, thank you.
What bioinformatics level are you? Do you know how to use a double
for
loop in R, for instance? Do you know how to use%in%
in R? Or are you novice?It sounds a bit like a function I made for R package gogadget 2.0 gogadget: an R package for go analysis visualization and interpretation, the function
gogadget.overlap
. In this function I count the number of genes that overlap between GO terms, then I calculate the overlap index from that and visualize it in a heatmap.Take a look at the R package https://sourceforge.net/projects/gogadget/ if you have some bioinformatics skills, if you are novice I advice you to try to get help from a bioinformatician in your neighborhood...
I am a student, and only recently started delving into Bioinformatics so I am still a novice.
Okay, in that case I would suggest you learn some coding skills first. I don't think it would be helpful to write the code for you (you'll learn nothing from that). Good luck.
Was this not answered in a recent previous thread: Protein name alignment for comparison and similarity score
Hi, yes it was, but the output was a bit different. The person who provided the original answer suggested I open a new question where he could provide a solution. Thanks.