Jaccard similarity in R
1
0
Entering edit mode
8.6 years ago
Nitin ▴ 170

Hello,

I have following two text files with some genes

Text file one Cd5l Mcm6 Wdhd1 Serpina4-ps1 Nop58 Ugt2b38 Prim1 Rrm1 Mcm2 Fgl1

Text file two Serpina4-ps1 Trib3 Alas1 Tsku Tnfaip2 Fgl1 Nop58 Socs2 Ppargc1b Per1 Inhba Nrep Irf1 Map3k5 Osgin1 Ugt2b37 Yod1

I want to compute jaccard similarity using R for this purpose I used sets package

md1<-read.csv("T1.csv",sep=",",header = FALSE)
M1<-set(md1)

md2<-read.csv("T2.csv",sep=",",header = FALSE)
M2<-set(md2)

Sim1<-set_similarity(M1,M2, method="Jaccard")

But it gives jaccard coefficient 0 (means no similarity) but i know there is some overlap between the two text files. I am not able figure out whats the problem. Can any body suggest some solution or is there any other way to compute the jaccard coefficient? between the two text files with gene symbols.

Thanks,

R • 16k views
ADD COMMENT
5
Entering edit mode
8.6 years ago

In the example you give, it looks like the files contents are white-space separated but you're reading them as comma-separated, so first thing check that md1 and md2 contain what you expect them to. Second, you can check intersection using the base R function intersect e.g.:

intersect(md1$V1,md2$V1)

Third, I suppose you're using the sets package. This package deals with sets of R objects so set(md1) creates a set of one R object, md1. What you probably meant is to create a set of gene names from md1, e.g.:

M1 <- as.set(md1$V1)

I think using a package here is overkill, you can easily compute the Jaccard index yourself from its definition:

I <- length(intersect(md1$V1,md2$V1))
S <- I/(length(md1$V1)+length(md2$V1)-I)
ADD COMMENT
0
Entering edit mode

This looks good, can we promote this to answer?

ADD REPLY
0
Entering edit mode

Sure. No problem .

ADD REPLY
0
Entering edit mode

Hi Jean,

Thanks for the nice solution it worked out :)

Nitin

ADD REPLY

Login before adding your answer.

Traffic: 1598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6