Hello,
I'm using scikit-learn for machine learning. I have 800 samples with 2048 features, therefore I want to reduce my features to get hopefully a better accuracy.
It is a multiclass problem (class 0-5), and the features consists of 1's and 0's: [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0]
I'm using the Random Forest Classifier.
Should I just feature select the training data ? And is it enough if I'm using this code:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)
clf=RandomForestClassifier(n_estimators=200,warm_start=True,criterion='gini', max_depth=13)
clf.fit(X_train, y_train).transform(X_train)
predicted=clf.predict(X_test)
expected=y_test
confusionMatrix=metrics.confusion_matrix(expected,predicted)
Cause the accuracy didn't get higher. Is everything OK in the code or am I doing something wrong?
I'll be very grateful for your help.