Question

filtering genes by pearson correlation

0

Entering edit mode

6.9 years ago

mannoulag1 ▴ 130

Hi biostars,

I did a pearson correlation to my data (expression matrix), and I keep only the correlation >0.8 . How can I obtain the sub expression matrix of only these highly correlated genes. Thank you

data<-t(matrix)
cor = cor(data, use="pairwise.complete.obs", method="pearson")
cor<-cor[abs(cor)>0.8]

correlation RNA-Seq R • 2.3k views

ADD COMMENT • link updated 10 months ago by Ram 45k • written 6.9 years ago by mannoulag1 ▴ 130

score 6 · Answer 1 · 2018-04-03

6

Entering edit mode

6.9 years ago

Jean-Karim Heriche 27k

Extract the indices of the genes of interest with which() and the arr.ind option, e.g.

idx <- which(abs(cor)>0.8, arr.ind = TRUE)
correlated.genes <- data[idx, ]

ADD COMMENT • link 6.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thank you Jean-Karim, I do this :

#cor is symmetric, so we can keep only the half of the pairs of indices
idx<-which( (abs(cor) > 0.8) & (upper.tri(cor)), arr.ind=TRUE)
correlated.genes <- matrix[idx, ]

Then I have to remove the duplicated genes from 'correlated.genes' ?

ADD REPLY • link 6.9 years ago by mannoulag1 ▴ 130

1

Entering edit mode

This was just to give you quick pointer. What I think you want is to get unique indices. Something like:

idx <- which( (abs(cor) > 0.8) & (upper.tri(cor)), arr.ind=TRUE)
idx <- unique(c(idx[, 1],idx[, 2])
correlated.genes <- matrix[idx, ]