Hello,
I have found so many techniques and posts to calculate correlation coefficient of a given Matrix data in R.
E.g. rrcor
, cor
etc . most of these comments were not useful when the data set is huge, for example in my case I have a Microarray data of 40000 rows (genes) and 3000 columns (samples)
Some example posts which I tried were as follows:
Check For Co-Expressed Genes In Microarray Experiments
When the data is huge, these approaches are either not working (e.g giving errors) or block or ....
I would like to calculate the correlation and p value of each pairs of genes and then rank them. Is there any useful approach ? How to group similar genes?
Looks like there are too many questions in one:
What is precisely your goal here?
@toni Thanks for this comment. In fact, you are right so many small questions at once!
Lets imagine I have a big matrix which I want to rank the genes based on their expression. I don't have any phenotype, I don't have any reference matrix , what I have is a Matrix, each row corresponds to a gene and each column corresponds to a sample
Define "not working".
I imagine that you're running into memory issues since you need 1.6 billion floating point values and R isn't known for being terribly memory efficient.
Yes for sure , definition of not working here = bloody freezing computer!
= Not being able to click or work with your computer forever
= Not being able to know whether it is working or just looping around :-D
It's likely swapping and thereby grinding the computer to a halt. Either use a computer with more memory (I wouldn't use anything with less than 16 gigs for this if you're using R) or implement this in C or another lower level language where you can control memory usage.