Entering edit mode
6.2 years ago
Vasu
▴
790
I have RNA-Seq data for 300 samples. In which 250 are Tumor and 50 are Normals. I have a matrix with genes as rows and samples as columns.
There are almost 56k genes as rows. Among these genes there are also lncRNAs.
I would like to check the correlation between a specific lncRNA and all other protein coding genes. I want the value of R
(correlation co-efficient).
How to do this for one lncRNA vs all protein coding genes in the genome?
so, with this
cor
how to proceed further? I'm interested in doing spearman correlation.Set the
method
argument. See here: https://www.rdocumentation.org/packages/stats/versions/3.5.1/topics/corSorry, I'm a bit confused. lets say I have matrix
A
like below. Ensembl ids as rows and Samples as columns. Using raw counts I usedcpm
function and converted them to logCPM values like below.Now, in this I want to check the correlation of
ENSG00000000005.5
on all other Ensembl ids.This is just an example data I'm showing. I have a single lncRNA and around 19k protein coding genes with logCPM values. How to apply the above function on this? And how to plot that with R (correlation coefficient value)?
my_cor <- apply(my_cpm, 1, function(x){cor(x,count["ENSG00000000005.5",], method = "spearman")})
I don't think plotting the correlation coefs would be particularly revealing; but you can do it if you want
thanks a lot. I got the correlation coefficient values (R). This could tell whether the lncRNA has strong, moderate or weak correlation with other protein coding genes. But I have a small question what is R square in correlation? What does R square tell?
Short description. if you have a pair of a variable (X and Y) then value
R^2
andr^2 (output of cor)
is the same. However, power of R^2 comes into the picture in multiple linear regression problem where multiple variables simultaneously used to predict the response.Reference :
Excerpt From: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. βAn Introduction to Statistical Learning.β iBooks.
You should maybe start by reading about statistics before going further into your analysis π