I have been working on OncoPredict. I was able to reproduce results of calcPhenotype() using the example data. But i am bit confused with the input data. What are column and rows in each datasets used
trainingExprData, trainingPtype and testExprData
I have data downloaded from GDSC and i have my expresssion data (DEGs) obtained from DESeq2. How do i prepare the input data for calcPhenotype()
Once you download GDSC data you will have file named: DataFiles.zip.
Extract the zip file
Convert DESeq2 expression data to log scale (if they are not): your_data_log_transformed.txt
then,
library(oncoPredict)
setwd("DataFiles/")
#Read GDSC2 response data. rownames() are samples, colnames() are drugs.
trainingPtype = readRDS(file = "Training Data/GDSC2_Res.rds")
trainingPtype<-exp(trainingPtype)
#GDSC2 expression data for the vignette (it's a much smaller sampling)
trainingExprData=readRDS(file='Training Data/GDSC2_Expr (RMA Normalized and Log Transformed).rds')
#Read testing data as a matrix with rownames() as genes and colnames() as samples.
testExprData=as.matrix(read.table('your_data_log_transformed.txt`', header=TRUE, row.names=1))
#Additional parameters.
batchCorrect<-"eb"
powerTransformPhenotype<-TRUE
removeLowVaryingGenes<-0.2
removeLowVaringGenesFrom<-"homogenizeData"
minNumSamples=10
selection<- 1
printOutput=TRUE
pcr=FALSE
report_pc=FALSE
cc=FALSE
rsq=FALSE
percent=80
#Run the calcPhenotype() function using the parameters you specified above.
calcPhenotype(trainingExprData=trainingExprData,
trainingPtype=trainingPtype,
testExprData=testExprData,
batchCorrect=batchCorrect,
powerTransformPhenotype=powerTransformPhenotype,
removeLowVaryingGenes=removeLowVaryingGenes,
minNumSamples=minNumSamples,
selection=selection,
printOutput=printOutput,
pcr=pcr,
removeLowVaringGenesFrom=removeLowVaringGenesFrom,
report_pc=report_pc,
cc=cc,
percent=percent,
rsq=rsq)
All these steps are documented well in calcPhenotype