Hi everyone, I'm trying to use GoSemSim but I'm struggling due to his results. I started using mgeneSim function, and i passed an array of 11000+ EntrezID genes. It gave me a similarity matrix with some columns and rows all containing value "1". I think is because, after mapping the EntrezID genes to GO, I noticed that some set of Go ID of two different genes have a GO ID in common.
To solve this problem, I tried to create a similarity matrix without mgeneSim, filling all the entries with the output of mgoSim for each couple of genes. In order to create this matrix, I calculated I need 30 like days, while mgeneSim just need a couple of hours.
Giving to mgeneSim and mgoSim the same parameters (measure="Wang", combine = "BMA") , the results are different. DO you know why?
How can I have consistent results from mgeneSim? Is it possible not to consider the GO ID two genes share?
A little example:
mgeneSim(c("3613", "83541", "5651", "23492", "157310"), semData=hsGO, measure="Wang", combine = "BMA", verbose = TRUE)
outputs
Now it works! I used ontology "MF", and it was the cause of my problems. I don't know why I didn't tried it before asking here, now I'll try to understand the differences between all the ontologies and why they outputs different values. Thank you so much!
Even if you use
"MF"
,"BP"
or"CC"
, it will work. To run with gene symbols you need to usekeytype = "SYMBOL"