random forest advice
0
0
Entering edit mode
4.1 years ago
marion.ryan ▴ 50

I am using random forest package to predict 'norm' versus 'chol', with the code below and have got a nice output regarding the importance of a panel of genes contributing to the classification of diseased tissues however I have been reading up on this and am wondering if I need a training and test data set, I have 11 normal and 18 diseased. I am very happy with the intuitive outputs this is giving but want to make sure its right

library(randomForest) clus2<-read.csv("PCA_NvC_SVM_sig.csv", sep = ",", header = T, row.names = 1) 
attach(clus2)
set.seed(71) 
clus2.rf <- randomForest(Pathology ~ ., data=clus2, importance=TRUE, proximity=TRUE) 
print(clus2.rf)

result Call: randomForest(formula = Pathology ~ ., data = clus2, importance = TRUE, proximity = TRUE) Type of random forest: classification Number of trees: 500 No. of variables tried at each split: 4

OOB estimate of error rate: 10.34% Confusion matrix: Chol Norm class.error Chol 17 1 0.05555556 Norm 2 9 0.18181818

Look at variable importance:

Imp<-round(importance(clus2.rf), 2) write.table(Imp, "Importance.csv",sep=",") varImpPlot(clus2.rf)

randomforest geneexpression • 983 views
ADD COMMENT

Login before adding your answer.

Traffic: 1642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6