GOSemSim - difference between mgeneSim and mgoSim
3
0
Entering edit mode
16 months ago
Giovanni ▴ 10

Hi everyone, I'm trying to use GoSemSim but I'm struggling due to his results. I started using mgeneSim function, and i passed an array of 11000+ EntrezID genes. It gave me a similarity matrix with some columns and rows all containing value "1". I think is because, after mapping the EntrezID genes to GO, I noticed that some set of Go ID of two different genes have a GO ID in common.

To solve this problem, I tried to create a similarity matrix without mgeneSim, filling all the entries with the output of mgoSim for each couple of genes. In order to create this matrix, I calculated I need 30 like days, while mgeneSim just need a couple of hours.

Giving to mgeneSim and mgoSim the same parameters (measure="Wang", combine = "BMA") , the results are different. DO you know why?

How can I have consistent results from mgeneSim? Is it possible not to consider the GO ID two genes share?

A little example:

mgeneSim(c("3613", "83541", "5651", "23492", "157310"), semData=hsGO, measure="Wang", combine = "BMA", verbose = TRUE)

outputs

output

gosemsim similarity • 1.1k views
ADD COMMENT
4
Entering edit mode
16 months ago
DareDevil ★ 4.3k

For Entrez ID, Consider your sets "gene_list.txt"

Gene ID
A2M   2
TNF   7124
....    .....

Then folllow the codes:

library(GOSemSim)
library(org.Hs.eg.db)
library(data.table) 

hsGO2 = godata('org.Hs.eg.db',  ont="BP", computeIC=FALSE) #ont = "MF" or "CC"

# reading the genes ID file
data = read.table(file = "gene_list.txt", header = T)

#storing in data structure data.table 
data = as.data.table(data)

#perform GoSemSim
result<-mgeneSim(data$ID, semData=hsGO2, measure="Wang", combine="BMA", verbose=FALSE)

#result matrix storing in dataframe.
res<-as.data.frame(as.table(result))

#writing the result to file
write.table(res, "result_bp_entrez.txt", quote = F, sep="\t")

For Gene Symbol, Consider your sets "gene_list.txt"

#Select gene ontology to find semantic similarity
hsGO2 <- godata('org.Hs.eg.db', keytype = "SYMBOL", ont="BP", computeIC=FALSE)

#perform GoSemSim
result<-mgeneSim(data$Gene, semData=hsGO2, measure="Wang", combine="BMA", verbose=FALSE)

#result matrix storing in dataframe.
res<-as.data.frame(as.table(result))

#writing the result to file
write.table(res, "result_bp_genes.txt", quote = F, sep="\t")
ADD COMMENT
0
Entering edit mode

Now it works! I used ontology "MF", and it was the cause of my problems. I don't know why I didn't tried it before asking here, now I'll try to understand the differences between all the ontologies and why they outputs different values. Thank you so much!

ADD REPLY
3
Entering edit mode

Even if you use "MF" , "BP" or "CC", it will work. To run with gene symbols you need to use keytype = "SYMBOL"

ADD REPLY
4
Entering edit mode
16 months ago
DareDevil ★ 4.3k

mgeneSim: mgeneSim calculates the semantic similarity between two sets of genes based on their functional annotations. It utilizes the Gene Ontology (GO) database, which provides structured information about gene functions. The function computes the pairwise similarity scores between genes using the Wang/Resnik measure, which is based on the concept of information content.

mgoSim: On the other hand, mgoSim calculates the semantic similarity between two sets of GO terms. It measures the similarity between GO terms based on their annotations and hierarchical relationships in the GO database. The function employs the Wang measure, which takes into account not only the information content of the most informative common ancestor term but also the depth and number of annotated genes in the two terms being compared. The Wang measure provides a more comprehensive similarity score by considering multiple factors.

mgeneSim calculates the similarity between genes, while mgoSim calculates the similarity between GO terms.

ADD COMMENT
0
Entering edit mode
16 months ago
Giovanni ▴ 10

So I think I need mgeneSim for my analysis.

How can handle the rows and columns full of 1s? I think is because mgeneSim maps the EntrezID genes to GO, and some set of Go ID of two different genes have a GO ID in common. Do you have any better explanation or am I right?

Thank you so much

ADD COMMENT

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6