I have two datasets. Matrix A of six columns and 300 rows. Matrix B of 12 columns and 300 rows.
I want to find correlation between (each) one column of matrix A and all the columns of Matrix B.
Basically to show which column of Matrix B has similar expression pattern as Matrix A. Can anyone help me with R code?
So, you are aiming to do some sort of multi-omics analysis? In that case, the correlation may be one of the easiest options. You can also use the cor.test function in place of cor() in order to derive a p-value for the correlation.
Another option may be to perform a regression between the datasets based on the genes.
How can I make sure whether the cor.test function exactly find correlation between two matrix with same order of genes? (as they are two different data-frames, I'm afraid if the order is wrong!)
my code is below, x is one matrix, y is another.
for (j in 1:length(x)) {
for (i in 1:length(y)) { a <- cor.test(x[,j], y[,i], method = "spearman")
output[i,j]<- paste(a$estimate)
}
}
Hey, to create the matrix of p-values for the correlation test, it is indeed a bit more difficult and requires a loop of some sort. To build on the code that you have written, you just need to do:
Thanks for the beautiful explanation, but my doubt is.. the row names(genes) of those two matrix is not in the same order.
My intention is to find whether the gene expression pattern is same or not.
I used this, I'm not sure it whether it will stay in same order while using cor.test()
The output that you get (in the figure) tells me that your numerical data is encoded as categorical, for whatever reason. Please check each step and use the function str() to see where this automatic conversion is taking place. Sometimes it may happen if there is a 'rogue' space or comma (or other non-numeric character) embedded in your numerical data somewhere.
Looking at your code, I suspect that as.matrix(z) is not doing what you think. Try data.matrix(), or just have melt(x)
thanks, how can I show whether it has "similar expression pattern"
Here is the limitation of correlation:
It does not take into account the magnitude of the underlying values.
To actually check expression patterns, I would merge the datasets and then do, for example:
scale(x)
for col scaling ort(scale(t(x)))
row scaling)thanks for enlightening me!
but the problem is they are from different platform, one is single cell ,another is ribotag seq.
here's my explanation
enter link description here
So, you are aiming to do some sort of multi-omics analysis? In that case, the correlation may be one of the easiest options. You can also use the cor.test function in place of
cor()
in order to derive a p-value for the correlation.Another option may be to perform a regression between the datasets based on the genes.
okay, Thanks a-lot!
How can I make sure whether the cor.test function exactly find correlation between two matrix with same order of genes? (as they are two different data-frames, I'm afraid if the order is wrong!)
my code is below, x is one matrix, y is another.
Hey, to create the matrix of p-values for the correlation test, it is indeed a bit more difficult and requires a loop of some sort. To build on the code that you have written, you just need to do:
create the random datasets again
create empty data-frame with rownames as the colnames of testmatrix1
double loop to calculate p-values
Now, check some values to ensure that it is correct:
Note that, for a larger data-matrix, we could compute-parallelise this with the
foreach
function and the%dopar%
operatorThanks for the beautiful explanation, but my doubt is.. the row names(genes) of those two matrix is not in the same order.
My intention is to find whether the gene expression pattern is same or not. I used this, I'm not sure it whether it will stay in same order while using cor.test()
Both matrices have 300 genes, right? You want to ensure that the order of the genes is the same?
After you use the
order()
function, a simple check like this can be performed:If you see just
TRUE
, then the gene names are the same and in the same order.If you see a
FALSE
, then there is a discrepancy. You can usewhich()
andmatch()
to harmonise them.That's TRUE, thanks....
I'm not sure why I'm getting Heatmap like this. This code is not working, only for this data.
z<- output; zz<- melt(as.matrix(z)) ggplot(data = zz, aes(x=Var2, y=Var1, fill=value)) + geom_tile()
heatmap
here is correlation matrix
The output that you get (in the figure) tells me that your numerical data is encoded as categorical, for whatever reason. Please check each step and use the function
str()
to see where this automatic conversion is taking place. Sometimes it may happen if there is a 'rogue' space or comma (or other non-numeric character) embedded in your numerical data somewhere.Looking at your code, I suspect that
as.matrix(z)
is not doing what you think. Trydata.matrix()
, or just havemelt(x)