How to calculate the Pearson correlation coefficients between a list of genes?
1
4
Entering edit mode
5.9 years ago
Chaimaa ▴ 260

Hello guys,

I have a list of around 300 genes and, I want to calculate gene co-expression value via the Pearson correlation coefficient (PCC) between these genes and return only the ones have larger than or less than 0.5.

Any code available for this, plz share?

Appreciate any help plz!!

PCC genes list • 7.5k views
ADD COMMENT
6
Entering edit mode
5.9 years ago

In R Programming Language:

1, create random data of 300 genes (rows) x 20 samples (columns)

mat <- matrix(rexp(6000, rate=0.1), ncol=20)
rownames(mat) <- paste0("gene", 1:nrow(mat))

dim (mat)
[1] 300  20

mat[1:5,1:5]
           [,1]      [,2]       [,3]     [,4]       [,5]
gene1  4.055854  5.427153 12.9471235 1.220788  8.0725705
gene2 10.805221  1.679573  2.0239176 3.754633 10.5629855
gene3 10.528569  3.867029 12.3135247 9.175522  4.5774353
gene4  6.893934  5.824185  0.3714950 4.295631  0.8289908
gene5  2.422062 14.197895  0.4122351 9.827562  0.8281283

2, create a gene correlation matrix

cormat <- cor(t(mat), method = "pearson")

dim(cormat)
[1] 300 300

3, how many correlations are > Pearson 0.5?

table(cormat>0.5)
FALSE  TRUE 
87528  2472

4, summarise all correlations > 0.5

corlist <- apply(cormat, 2, function(x) names(which(x>0.5)))
corlist <- do.call(rbind, lapply(corlist, function(x) paste(x, collapse = ", ")))

head(corlist)
  [,1]                                                                                   
gene1 "gene1, gene107, gene144, gene225, gene255, gene272"                                   
gene2 "gene2, gene159, gene280"                                                              
gene3 "gene3, gene110, gene113, gene165, gene178, gene244"                                   
gene4 "gene4, gene14, gene35, gene41, gene54, gene61, gene73, gene91, gene107, gene161"      
gene5 "gene5, gene84, gene103, gene126, gene153, gene154, gene163, gene247, gene267, gene292"
gene6 "gene6, gene68, gene81, gene123, gene166, gene189, gene297"                            

# a different way, retaining name-value pairs:
corlist <- apply(cormat, 2, function(x) x[x>0.5])
head(corlist)
$gene1
    gene1    gene27    gene82    gene86   gene124   gene134   gene238   gene298 
1.0000000 0.6933233 0.6195321 0.5372766 0.5718707 0.6206825 0.6603721 0.5414058 

$gene2
    gene2    gene38 
1.0000000 0.6048402 

$gene3
    gene3   gene136   gene145   gene255 
1.0000000 0.5972609 0.5366873 0.6380525 

$gene4
    gene4    gene12   gene146   gene239   gene270 
1.0000000 0.5542958 0.7369206 0.5382795 0.5211835 

$gene5
    gene5    gene85    gene97   gene102   gene181   gene182   gene202   gene203 
1.0000000 0.8111300 0.6299341 0.6213753 0.5584764 0.5344177 0.5881075 0.5067301 
  gene229 
0.5336602 

$gene6
    gene6    gene34   gene102   gene152   gene177   gene181   gene190   gene202 
1.0000000 0.5356967 0.5579895 0.6305093 0.5663306 0.6597329 0.5897418 0.6123978 
  gene229   gene241 
0.5903611 0.5054317

Kevin

ADD COMMENT
0
Entering edit mode

@Kevin Blighe

Hi, Kevin I already have the samples too, list of 300 genes and many samples Can I apply the same code? I want the output like edges list gene1 gene2 PCC value between them

ADD REPLY
1
Entering edit mode

Every correlation value that you require will be in the cormat object. You can manipulate that in order to get the data in whatever final format you require.

cormat[1:5,1:10]
            gene1       gene2      gene3       gene4       gene5      gene6
gene1  1.00000000 -0.21283479 -0.1810563  0.26833246 -0.09807815  0.2732797
gene2 -0.21283479  1.00000000 -0.3612355  0.07721188  0.04814679 -0.2055238
gene3 -0.18105633 -0.36123549  1.0000000 -0.32038329 -0.39066175 -0.2079907
gene4  0.26833246  0.07721188 -0.3203833  1.00000000  0.02838447  0.2693119
gene5 -0.09807815  0.04814679 -0.3906617  0.02838447  1.00000000 -0.2314281
            gene7       gene8       gene9      gene10
gene1  0.04491062  0.20639200  0.01804317 -0.09494333
gene2 -0.12443723  0.10425807 -0.02924639  0.33801704
gene3 -0.11999162 -0.29285677  0.05683996 -0.08382801
gene4  0.18194632 -0.02893122 -0.07697578 -0.22989306
gene5  0.01205250  0.33458971 -0.32346308 -0.29038923
ADD REPLY
0
Entering edit mode

I want to keep the PCC values larger than 0.5 or smaller than -0.5? Which command do i use in R?

ADD REPLY
5
Entering edit mode

Sorry, I will not be doing everything. One needs to learn how to do these simple tasks on their own. I have already given substantial information in my answer and comments. Take a look at my answer again, particularly, part 4. Good luck. [hint: to also get those < -0.5, use the abs() function]

ADD REPLY
0
Entering edit mode

you have made a parallel version to calculate large correlation matrix i think along with the clusgap function...can you post the link here? found it link you have removed the correaltion matrix..part

ADD REPLY
0
Entering edit mode

Yes, I was not 100% content with it. I believe there is another package / function called 'bigcor', which can do the same. A lot of the 'parallel' functionality that I developed was also 'absorbed' into my RegParallel Bioconductor package, while clusGapKB was absorbed into another future pipeline / package, cytofNet (https://github.com/kevinblighe/cytofNet)

ADD REPLY
0
Entering edit mode

you can subset isn't it based on your desired threshold !!

ADD REPLY

Login before adding your answer.

Traffic: 2527 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6