R: Error In Pvclust Function While Clustering
1
1
Entering edit mode
11.9 years ago
Diana ▴ 930

Hi all,

I'm trying to cluster RNA-seq data using pvclust function from pvclust package, it gives me this error: cannot allocate vector of length 1623767616 I'm wondering if this is because I have 40296 genes and its too much data?

My code is this:

test2<-read.csv("RNAseq_to_cluster.csv", sep=",")
test3<-test2[,2:4]  #columns contain samples
row.names(test3)<-test2$gene
matrix<-data.matrix(test3)
transpose= t(matrix)
pv <- pvclust(transpose, method.dist="correlation", method.hclust="average", nboot=1000)

Error in cor(x, method = "pearson", use = use.cor) : 
  cannot allocate vector of length 1623767616

EDIT: first few lines of the input file:

gene    sample1    sample2    sample3
Mar-01    4.19504    3.9006    4.15683
Mar-02    3.0554    3.4261    3.76675
Sep-02    77.1536    65.1284    76.4927
Mar-03    1.01555    1.28626    0.461987

Please help.

Thanks!

r clustering • 5.6k views
ADD COMMENT
0
Entering edit mode

Yeah there isn't enough memory to make a vector of that size. But I don't see why it would need to make a vector of that size for what you are doing. Can you post the first few lines of the csv input file?

ADD REPLY
0
Entering edit mode

I've posted a few lines of the input file

ADD REPLY
0
Entering edit mode

Try repeating with less number of genes, to get an answer. I assume, you have reached the R memory limit of 4GB. Check this post and post for possible workarounds.

ADD REPLY
0
Entering edit mode

Statistically it's not a great idea to blow up a 40k × 3 dataset into a 40k × 40k correlation matrix

ADD REPLY
0
Entering edit mode
11.9 years ago

I don't think you need to do much to your data input to run the pvclust function. The transposition of the data matrix might be the problem. Instead of finding pair-wise correlation for just 3 sets of data (sample1,2,3), the transposition might be telling pvclust to do it for 40,000 sets of data (genes).

Try just this:

data = as.matrix(read.csv('RNAseq_to_cluster.csv',sep=',',header=TRUE, row.name = 1))
pv <- pvclust(data, method.dist="correlation", method.hclust="average", nboot=1000)
ADD COMMENT
0
Entering edit mode

pvclust clusters columns that's why I was using the transpose function otherwise it just clusters the samples whereas I want to cluster the genes according to their expression profiles in the 3 samples

ADD REPLY
0
Entering edit mode

I see. I skimmed through pvclust description and thought you just wanted to cluster by sample. Perhaps the package just wasn't designed to cluster that many columns? Are you specifically interested in the p-values pvclust generates? If not, there are plenty of generic hierarchical clustering scripts out there that will handle large amount of genes and run faster. Clustering using python's scipy is pretty fast. You might want to look at this also: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm

ADD REPLY

Login before adding your answer.

Traffic: 1816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6