I have been working on OncoPredict. I was able to reproduce results of calcPhenotype() using the example data. But i am bit confused with the input data. What are column and rows in each datasets used
trainingExprData, trainingPtype and testExprData
I have data downloaded from GDSC and i have my expresssion data (DEGs) obtained from DESeq2. How do i prepare the input data for calcPhenotype()
Once you download GDSC data you will have file named: DataFiles.zip.
Extract the zip file
Convert DESeq2 expression data to log scale (if they are not): your_data_log_transformed.txt
then,
library(oncoPredict)
setwd("DataFiles/")
#Read GDSC2 response data. rownames() are samples, colnames() are drugs.
trainingPtype = readRDS(file = "Training Data/GDSC2_Res.rds")
trainingPtype<-exp(trainingPtype)
#GDSC2 expression data for the vignette (it's a much smaller sampling)
trainingExprData=readRDS(file='Training Data/GDSC2_Expr (RMA Normalized and Log Transformed).rds')
#Read testing data as a matrix with rownames() as genes and colnames() as samples.
testExprData=as.matrix(read.table('your_data_log_transformed.txt`', header=TRUE, row.names=1))
#Additional parameters.
batchCorrect<-"eb"
powerTransformPhenotype<-TRUE
removeLowVaryingGenes<-0.2
removeLowVaringGenesFrom<-"homogenizeData"
minNumSamples=10
selection<- 1
printOutput=TRUE
pcr=FALSE
report_pc=FALSE
cc=FALSE
rsq=FALSE
percent=80
#Run the calcPhenotype() function using the parameters you specified above.
calcPhenotype(trainingExprData=trainingExprData,
trainingPtype=trainingPtype,
testExprData=testExprData,
batchCorrect=batchCorrect,
powerTransformPhenotype=powerTransformPhenotype,
removeLowVaryingGenes=removeLowVaryingGenes,
minNumSamples=minNumSamples,
selection=selection,
printOutput=printOutput,
pcr=pcr,
removeLowVaringGenesFrom=removeLowVaringGenesFrom,
report_pc=report_pc,
cc=cc,
percent=percent,
rsq=rsq)
All these steps are documented well in calcPhenotype
Hi, im working with OncoPredict. Can you tell me if you have any info on the input of counts in to the algorithm. Do I need to log transform them and if so can I do log2(data + 1) or do I need to do any other normalization such as vst or something?
Hi, im working with OncoPredict. Can you tell me if you have any info on the input of counts in to the algorithm. Do I need to log transform them and if so can I do log2(data + 1) or do I need to do any other normalization such as vst or something?
Inputs can be data converted in log2 scale or vst.