I'm using WEKA for my project on disease gene prediction and have ~70 attributes to train classifier from. I've a total of ~20k instances in my entire set with ~400 being positive instances.
I've created 75 training sets with 1:1 ratio of positive and negative instances and would like to optimize classifiers(J48, RandomForest, SVM) for each training set, generate models for each training set and finally combine these models to make predictions on test data. As of now, I try picking up classifier parameters (trees, gamma, cost etc) and use a hit and trial method. However, neither does it significantly improve results, nor saves me time.
Bootstrapping significantly improves the performance, but I'm unable to improve my precision beyond 75% and recall beyond 80%. I would also like to use attribute selection.
Any suggestions for additional training attributes are also appreciated (excluding AA composition, Physicochemical properties, Sub-cellular Localization and Network Topologies)!
Can anyone help?