Correlated/Coexpressed genes , given the name of a gene.
1
4
Entering edit mode
9.8 years ago
Deepak Tanwar ★ 4.2k

Hi,

Given,I have a gene expression dataset. I want to find out, all the genes that are highly correlated with the given gene/ have the same expression pattern.

Example, if there would be an R package, it should take one input as expression data matrix and another input as gene of interest and should provide an output of a list of genes, that could have same expression pattern or are highly correlated.

I know that I could write a function in R regarding this and filter the genes on the basis of their correlation values and take into account the cor value more than 0.5. But, it takes so much of time, when you have a data with more than 20,000 rows and 1,000 columns.

correlation R • 3.9k views
ADD COMMENT
2
Entering edit mode

Could you share your code to see why it is very slow? I gave it a try on my computer with a toy example (20'000 rows and 1'000 columns) and it only took few seconds. You should be able to do this without any problem in R.

ADD REPLY
2
Entering edit mode
high_correlation_pval_table<- NULL
# iterate through all genes
for(i in 1:nrow(log2_BRCA)){
  # correlation
  cor <- cor.test(as.numeric(log2_BRCA[which(substr(rownames(log2_BRCA), start = 1, stop = 6) == "SLC5A5"),]), 
                  as.numeric(log2_BRCA[i,]), method = "spearman", alternative = "two.sided")
  # conditions
  if(abs(as.numeric(cor$estimate)) < 0.5 | abs(as.numeric(cor$estimate)) == 1 | is.na(abs(as.numeric(cor$estimate)))){
    next  
  }
  else{
    high_correlation_pval_table <- rbind(
                                     high_correlation_pval_table,
                                     data.frame(
                                           Gene = "SLC5A5",
                                           Cor_Gene = rownames(log2_BRCA[i,]), 
                                           Correlation_Value = as.numeric(cor$estimate), 
                                           p_value = as.numeric(cor$p.value), 
                                           stringsAsFactors = F,
                                           check.names = F))
  }

}
ADD REPLY
0
Entering edit mode

Deepak, may you help me as already

Suppose that I have already downloaded GSE63706 and normalized that and I have a normalized text file now. and I have also a list of probesets (a text file of my interest probesets) from this array, I want to have a heat map showing the expression pattern of my interest probesets in this array, for example in this array I have 4 varieties and different tissues (rind and flesh) and phases (0,10,20,30,40 and 50 days after harvesting).

ADD REPLY
2
Entering edit mode

Heatmap is not a problem at all. There is a R package called pheatmap. There is a very easy way to show the above mentioned groups with the heatmap. These are called as the annotations of a heatmap. Check this out: How Do I Draw A Heatmap In R With Both A Color Key And Multiple Color Side Bars?

There are codes as well. You could use any heatmap packages, there a many. pheatmap, heatmap, heatmap3, heatmap.3, heatmap.2, whatever you prefer. Read the above mentioned Biostars link properly. Tryout the example codes properly and then edit them according to your data.

ADD REPLY
1
Entering edit mode

Thank you very much

ADD REPLY
3
Entering edit mode
9.8 years ago
TriS ★ 4.7k

Using the function below is pretty fast using 1500 columns and 30k rows

mat <- replicate(1500, rnorm(30000)) 
gene <- mat[sample(1:30000,1),]

system.time({
resp <- c()
est <- c()
for (i in 1:30000){
  x <- cor.test(gene, mat[i,])
  if (abs(x$estimate[[1]]) > 0.5){
    resp <- c(resp, x$p.value)
    est <- c(est, x$estimate[[1]])
  }
}
})

 user  system elapsed 
   7.69    0.02    7.71
ADD COMMENT

Login before adding your answer.

Traffic: 2020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6