Hi, I have gene expression data from two tissues of the same mouse model(Tissue A and Tissue B). I want to do a correlation test and need to work only on the significantly correlated genes between Tissue A and Tissue B. Finally, I would like to show a heatmap and correlation network graph. To give you an idea of how my data look like, here I stimulated with the following line of codes in R.
llibrary(Hmisc)
library(multtest)
library(reshape2)
library(dplyr)
data(golub)
mat = golub
rownames(mat) = paste0("G",1:nrow(mat))
#split the samples up randomly
m1 = mat[,seq(1,ncol(mat),2)]
m2 = mat[,seq(2,ncol(mat),2)]
test = rcorr(t(rbind(m1,m2)))
n = nrow(m1)
## Now to see correlation between the two tissues:
results = melt(test$r[1:n,(n+1):(2*n)],value.name ="pcc") %>%
left_join(melt(test$P[1:n,(n+1):(2*n)],value.name ="p"),by=c("Var1","Var2"))
## to get the signficant(FDR)
sig = droplevels(results[p.adjust(results$p,"BH")<0.05,])
new.max <- results %>% filter(Var1 %in% sig$Var1 & Var2 %in% sig$Var2) %>%
recast(Var1 ~ Var2,data=.,measure.var="pcc")
Now I have the significant genes which I would like to show by heatmap or network graph. However, the new matrix( new.max) is not anymore square, and I am not sure if I can show co-expressed genes in this form of a matrix. Would it be great to advise me on approaching this big matrix in a more meaningful way? I could show the whole matrix as a heatmap, but it is vast, and my computer is not handling it. That is why I restrict to significant genes only! Best, AD
In my experience gplots::heatmap2 function does not require an advanced computer. By default it will also cluster your genes, highlighting co-expression.
@Alex, Thank you so much! I will plot the full matrix then using gplots::heatmap2(). However, do you recommend converting the similarity matrix which I am interested in (test$r[1:n,(n+1):(2*n)]) to be converted to the adjacency matrix before plotting as heatmap or in the graph?
If so ...actually I have tried using the function adjacency.fromSimilarity() in the WGCNA package and got error like Error in checkAdjMat(similarity, min, max) : adjacency is not numeric.
here is the line of codes I used.
Best, Amare
I've seen adjacency heatmaps only for visualization of the whole WGCNA network when it is used with gene dendrograms, so I can't advise here. I can imagine that adjacency heatmap may consist almost solely of two bright colors since adjacency is an exponential function. Concerning the adjacency calculation, similarity should be a symmetric numerical matrix with values from -1 to 1. Of these requirements, heatmap_indices does not match only symmetry, but this is quite a problem for similarity matrix) Please, check heatmap_indices generation. By the way, how did you choose to set power=12?
@Alex, Thanks again for your feedback! I don't have informatics and R knowledge. I am just learning by doing using my data. I chose power 12 by mistake; the default in WGCNA is 6. The heatmap_indices object(heatmap_indices <- test$r[1:n,(n+1):(2*n)]) represents the top-right block of the similarity matrix(test = rcorr(t(rbind(m1,m2)))). I chose this block to show the possible correlation between tissue A(m1) and tissue(m2). The other blocks, for example, top left is showing the self-correlation of m1. I am interested in the block where I can see the correlation between m1(Tissue A) and m2(Tissue B). My objective is exploring correlated genes of this block and show in a plot. Adjacency calculation seems impossible for this block as the block's max and min value is between -0.9 and + 0.9. I am not sure maybe I need to change the structure of the data from test = rcorr(t(rbind(m1,m2))) to test = rcorr(t(cbind(m1,m2))) Anyways if I am doing nonsense, please excuse me! Best, AD
Use
ADD REPLY/ADD COMMENT
when responding to existing posts to keep threads logically organized.Similarity matrix: I see in heatmap_indices that G1 to G2 correlation is 0.21425986 in first row and 0.11289293 in the first column. I think that the similarity matrix should be symmetrical, i.e. G1 to G2 correlation/similarity in two cells heatmap_indices[G1, G2] and heatmap_indices[G2, G1] should be the same value. WGCNA soft power: as far as I know, one can not choose this value arbitrarily or use a default value. Please check Consensus WGCNA: soft-threshold power selection https://horvath.genetics.ucla.edu/html/CoexpressionNetwork/Rpackages/WGCNA/faq.html (see section 6). Also, I think that (test = rcorr(t(rbind(m1,m2)))) does not actually produce the correlation of gene expression values between two tissues A and B, if your actual data are similar to your generated example (genes in rows, samples in columns in two matrices for tissues A and B). In your example, binding two matrices of gene expression just concatenes two vectors of expression values for a given gene in two tissues (i.e., (c(G1_expression in_tissueA, G2_expression in_tissueB)) and these values for all genes are passed to rcorr. I think that it actually should be rcorr(t(m1),t(m2)). Then you can adjust p-values for a half of this matrix (adjusting p-values for the whole matrix will deflate p-values, since it increases the number of comparisons twice) and filter out the non-significant values.
@Alex, Thank you again for your answers! However, rcorr(t(m1),t(m2)) and rcorr(t(rbind(m1,m2)))) would give you the same result. Value differences you noticed in the heatmap_indices object is TRUE. This is because the heatmap_indices object does not represent the full correlation matrix rather the top right quadrate of the full matrix. Here simulated small size data for you to observe the difference.
I am interested in the TOP RIGHT side of the full matrix because I believe the true correlation between Tissue A (m1) and Tissue B(m2) lies here(or in the BOTTOM LIFT).
Yes, I see, I missed the difference between rcorr and cor behaviors.
If the issue is just producing a large correlation matrix, then use bigcor: https://www.rdocumentation.org/packages/propagate/versions/1.0-6/topics/bigcor
Thank you so much Kevin! However, I may ask some tutorial on this correlation? Incase my question wasn't clear I might re-post it again or modifying it.