Correlation between lncRNA and protein coding genes
1
0
Entering edit mode
6.0 years ago
Vasu ▴ 790

I have RNA-Seq data for 300 samples. In which 250 are Tumor and 50 are Normals. I have a matrix with genes as rows and samples as columns.

There are almost 56k genes as rows. Among these genes there are also lncRNAs.

I would like to check the correlation between a specific lncRNA and all other protein coding genes. I want the value of R (correlation co-efficient).

How to do this for one lncRNA vs all protein coding genes in the genome?

RNA-Seq R correlation lncRNA genes • 2.7k views
ADD COMMENT
1
Entering edit mode
6.0 years ago

If your lncRNA is on the ith line in the matrix then in R if count is your matrix

cor <- apply(count,1,function(x){cor(x,count[i,])})

You may choose Pearson or Spearman for the correlation

ADD COMMENT
0
Entering edit mode

so, with this cor how to proceed further? I'm interested in doing spearman correlation.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Sorry, I'm a bit confused. lets say I have matrix A like below. Ensembl ids as rows and Samples as columns. Using raw counts I used cpm function and converted them to logCPM values like below.

                       Sample1            Sample2           Sample3          Sample4          Sample5
ENSG00000000003.14        17.146506        16.822596        16.781746        16.932891        16.263722
ENSG00000000005.5          6.782761         7.941372         8.520003         8.241359         7.797734
ENSG00000000419.12        16.279996        16.663848        15.908999        14.737590        15.665799
ENSG00000000457.13        15.347626        15.454124        15.211375        15.686339        16.339990
ENSG00000000460.16        15.546598        15.720200        15.331334        15.262918        15.766690

Now, in this I want to check the correlation of ENSG00000000005.5 on all other Ensembl ids.

This is just an example data I'm showing. I have a single lncRNA and around 19k protein coding genes with logCPM values. How to apply the above function on this? And how to plot that with R (correlation coefficient value)?

ADD REPLY
1
Entering edit mode

my_cor <- apply(my_cpm, 1, function(x){cor(x,count["ENSG00000000005.5",], method = "spearman")})

I don't think plotting the correlation coefs would be particularly revealing; but you can do it if you want

ADD REPLY
0
Entering edit mode

thanks a lot. I got the correlation coefficient values (R). This could tell whether the lncRNA has strong, moderate or weak correlation with other protein coding genes. But I have a small question what is R square in correlation? What does R square tell?

ADD REPLY
1
Entering edit mode

Short description. if you have a pair of a variable (X and Y) then value R^2 and r^2 (output of cor) is the same. However, power of R^2 comes into the picture in multiple linear regression problem where multiple variables simultaneously used to predict the response.

Reference :
Excerpt From: Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. β€œAn Introduction to Statistical Learning.” iBooks.

ADD REPLY
0
Entering edit mode

You should maybe start by reading about statistics before going further into your analysis πŸ˜‰

ADD REPLY

Login before adding your answer.

Traffic: 1522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6