Hi,
I am running the 'Preprocessing of Gene Expression data (IlluminaHiSeq_RNASeqV2)' and 'TCGAanalyze_SurvivalKM: Correlating gene expression and Survival Analysis' R-commands as-is from the Bioconductor page for TCGAbiolinks (http://bioconductor.org/packages/release/bioc/vignettes/TCGAbiolinks/inst/doc/analysis.html#tcgaanalyze_survivalkm:_correlating_gene_expression_and_survival_analysis)
However, I run into the following error when running this command (as-is, from the manual) in R-studio.
for( i in 1: round(nrow(dataBRCAcomplete)/100)){
message( paste( i, "of ", round(nrow(dataBRCAcomplete)/100)))
tokenStart <- tokenStop
tokenStop <-100*i
tabSurvKM<-TCGAanalyze_SurvivalKM(clinical_patient_Cancer,
dataBRCAcomplete,
Genelist = rownames(dataBRCAcomplete)[tokenStart:tokenStop],
Survresult = F,
ThreshTop=0.67,
ThreshDown=0.33)
tabSurvKMcomplete <- rbind(tabSurvKMcomplete,tabSurvKM)
}
Error: Error in 1:lastelementTOP : result would be too long a vector
Since I am using the example provided by Bioconductor, not sure what is the problem.
Any help would be much appreciated!
Have you additionally executed the following before the for loop:
Yes, I executed their example script as is.
Okay, how much free RAM have you got?; 32- or 64-bit machine?; R version?; operating system and version?
Hi Kevin, sorry for the late response -- did not see your message. I'm running RStudio on a Mac (Sierra), R version 3.4.3. 64-bit.
Maybe 2GB of free RAM?
May not be enough. I have 16GB RAM on my personal laptop. Can you try to reduce the size of the data and at least see if the code runs to completion?
Is there a way to split a matrix by nrows and write to n new matrices?
You could just take the first 500 rows as a test, like this:
This is the output on the test.
Results in empty tabSurvKM and tabSurvKMcomplete tables
I would contact the developers of the packages. In many situations, packages are not updated in new versions of R, and/or other dependency issues arise as new packages are released on Bioconductor without adequate testing. To further compound the problem, the TCGA consortium has been shifting their data around and one finds that links on Government-hosted websites (hosting the data) are broken.
I believe that the contact for TCGA biolinks is Tiago Silva in São Paulo, Brazil, where I frequently pass through.
Oh, just one, thing, please try it outside R Studio ('regular' R). I never use R Studio because it adds that one little extra thing to my analyses that could cause problems.
This is a good idea. Thanks for your input -- I will update progress here.
How did it go?