I'm a novice in machine learning-based classification techniques. Please do help
- What is the difference between SMO (weka) and the LibSVM algorithms? Which is the best? Because the parameter requirements of the two are very different.
- Feature reduction (e.g.PCA) and feature selection (e.g. InfoGain) are two different techniques for reducing features. Which one to rely on? In which conditions are they to be used?
- In Infogain eval, the ranking algorithm ranks the features and the threshold parameter can remove the unwanted features with respect to entropy measure. Can we optimize both? Or do we optimize one of them alternatively? What I should I be looking for - accuracy?
- Is accuracy the only thing that I should be looking for? Of course there is overfitting, but can I quantify the predictive power of the model other than just CV accuracy? Some other measure or technique?
http://stats.stackexchange.com/questions/tagged/machine-learning
This looks like a class assignment. Regarding 1), the question doesn't make sense: SMO is an algorithm for solving optimization problems. LibSVM is a software library that implements the SMO algorithm.
No this is no class assignment. SMO used by weka is a different algorithm from the solvers used by LibSVM. Hence, their parameters are quite different. But the question is which one is better? And principally what are the differences?
This isn't a bioinformatics question though, this is a machine learning question. This isn't really the forum for it. I'd try posting your question to the machine learning tag of the stats stack exchange.
Both Weka and libSVM use the SMO algorithm. They have different implementations (and they reference different papers) but then your question is actually: is Weka's SMO implementation better (by what criteria ?) than the libSVM implementation?
Edit: LibSVM actually use a SMO-like algorithm hence the different papers Weka and libSVM reference.
Also given that there are 4 different questions, it would be better if they were posted separately so that answers could address them separately. This would improve clarity of the posts.