In R Programming Language:
1, create random data of 300 genes (rows) x 20 samples (columns)
mat <- matrix(rexp(6000, rate=0.1), ncol=20)
rownames(mat) <- paste0("gene", 1:nrow(mat))
dim (mat)
[1] 300 20
mat[1:5,1:5]
[,1] [,2] [,3] [,4] [,5]
gene1 4.055854 5.427153 12.9471235 1.220788 8.0725705
gene2 10.805221 1.679573 2.0239176 3.754633 10.5629855
gene3 10.528569 3.867029 12.3135247 9.175522 4.5774353
gene4 6.893934 5.824185 0.3714950 4.295631 0.8289908
gene5 2.422062 14.197895 0.4122351 9.827562 0.8281283
2, create a gene correlation matrix
cormat <- cor(t(mat), method = "pearson")
dim(cormat)
[1] 300 300
3, how many correlations are > Pearson 0.5?
table(cormat>0.5)
FALSE TRUE
87528 2472
4, summarise all correlations > 0.5
corlist <- apply(cormat, 2, function(x) names(which(x>0.5)))
corlist <- do.call(rbind, lapply(corlist, function(x) paste(x, collapse = ", ")))
head(corlist)
[,1]
gene1 "gene1, gene107, gene144, gene225, gene255, gene272"
gene2 "gene2, gene159, gene280"
gene3 "gene3, gene110, gene113, gene165, gene178, gene244"
gene4 "gene4, gene14, gene35, gene41, gene54, gene61, gene73, gene91, gene107, gene161"
gene5 "gene5, gene84, gene103, gene126, gene153, gene154, gene163, gene247, gene267, gene292"
gene6 "gene6, gene68, gene81, gene123, gene166, gene189, gene297"
# a different way, retaining name-value pairs:
corlist <- apply(cormat, 2, function(x) x[x>0.5])
head(corlist)
$gene1
gene1 gene27 gene82 gene86 gene124 gene134 gene238 gene298
1.0000000 0.6933233 0.6195321 0.5372766 0.5718707 0.6206825 0.6603721 0.5414058
$gene2
gene2 gene38
1.0000000 0.6048402
$gene3
gene3 gene136 gene145 gene255
1.0000000 0.5972609 0.5366873 0.6380525
$gene4
gene4 gene12 gene146 gene239 gene270
1.0000000 0.5542958 0.7369206 0.5382795 0.5211835
$gene5
gene5 gene85 gene97 gene102 gene181 gene182 gene202 gene203
1.0000000 0.8111300 0.6299341 0.6213753 0.5584764 0.5344177 0.5881075 0.5067301
gene229
0.5336602
$gene6
gene6 gene34 gene102 gene152 gene177 gene181 gene190 gene202
1.0000000 0.5356967 0.5579895 0.6305093 0.5663306 0.6597329 0.5897418 0.6123978
gene229 gene241
0.5903611 0.5054317
Kevin
@Kevin Blighe
Hi, Kevin I already have the samples too, list of 300 genes and many samples Can I apply the same code? I want the output like edges list
gene1 gene2 PCC value between them
Every correlation value that you require will be in the
cormat
object. You can manipulate that in order to get the data in whatever final format you require.I want to keep the PCC values larger than 0.5 or smaller than -0.5? Which command do i use in R?
Sorry, I will not be doing everything. One needs to learn how to do these simple tasks on their own. I have already given substantial information in my answer and comments. Take a look at my answer again, particularly, part 4. Good luck. [hint: to also get those < -0.5, use the
abs()
function]you have made a parallel version to calculate large correlation matrix i think along with the clusgap function...can you post the link here? found it link you have removed the correaltion matrix..part
Yes, I was not 100% content with it. I believe there is another package / function called 'bigcor', which can do the same. A lot of the 'parallel' functionality that I developed was also 'absorbed' into my RegParallel Bioconductor package, while clusGapKB was absorbed into another future pipeline / package, cytofNet (https://github.com/kevinblighe/cytofNet)
you can subset isn't it based on your desired threshold !!