Question

Machine learning based classification techniques

1

Entering edit mode

9.1 years ago

tigeradab ▴ 70

I'm a novice in machine learning-based classification techniques. Please do help

What is the difference between SMO (weka) and the LibSVM algorithms? Which is the best? Because the parameter requirements of the two are very different.
Feature reduction (e.g.PCA) and feature selection (e.g. InfoGain) are two different techniques for reducing features. Which one to rely on? In which conditions are they to be used?
In Infogain eval, the ranking algorithm ranks the features and the threshold parameter can remove the unwanted features with respect to entropy measure. Can we optimize both? Or do we optimize one of them alternatively? What I should I be looking for - accuracy?
Is accuracy the only thing that I should be looking for? Of course there is overfitting, but can I quantify the predictive power of the model other than just CV accuracy? Some other measure or technique?

machine-learning • 2.6k views

ADD COMMENT • link updated 2.3 years ago by Ram 44k • written 9.1 years ago by tigeradab ▴ 70

3

Entering edit mode

http://stats.stackexchange.com/questions/tagged/machine-learning

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.1 years ago by andrew.j.skelton73 6.6k

2

Entering edit mode

This looks like a class assignment. Regarding 1), the question doesn't make sense: SMO is an algorithm for solving optimization problems. LibSVM is a software library that implements the SMO algorithm.

ADD REPLY • link 9.1 years ago by Jean-Karim Heriche 27k

1

Entering edit mode

No this is no class assignment. SMO used by weka is a different algorithm from the solvers used by LibSVM. Hence, their parameters are quite different. But the question is which one is better? And principally what are the differences?

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.1 years ago by tigeradab ▴ 70

2

Entering edit mode

This isn't a bioinformatics question though, this is a machine learning question. This isn't really the forum for it. I'd try posting your question to the machine learning tag of the stats stack exchange.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.1 years ago by andrew.j.skelton73 6.6k

1

Entering edit mode

Both Weka and libSVM use the SMO algorithm. They have different implementations (and they reference different papers) but then your question is actually: is Weka's SMO implementation better (by what criteria ?) than the libSVM implementation?

Edit: LibSVM actually use a SMO-like algorithm hence the different papers Weka and libSVM reference.

ADD REPLY • link updated 5.1 years ago by Ram 44k • written 9.1 years ago by Jean-Karim Heriche 27k

1

Entering edit mode

Also given that there are 4 different questions, it would be better if they were posted separately so that answers could address them separately. This would improve clarity of the posts.

ADD REPLY • link 9.1 years ago by Jean-Karim Heriche 27k

Ram · Answer 1 · 2016-05-18

What is the difference between SMO (weka) and the LibSVM algorithms? Which is the best? Because the parameter requirements of the two are very different.

SVM is a classifier algorithm problem, SMO is one of the common optimization algorithms to solve this problem, libSVM is a library implements SMO.
Feature reduction (e.g.PCA) and feature selection (e.g. InfoGain) are two different techniques for reducing features. Which one to rely on? In which conditions are they to be used?

Feature reduction makes feature transformation, while feature selection doesn't
In Infogain eval, the ranking algorithm ranks the features and the threshold parameter can remove the unwanted features with respect to entropy measure. Can we optimize both? Or do we optimize one of them alternatively? What I should I be looking for - accuracy?

This question is not clear, what does both stand for? Entropy and what? Optimization methods are very flexible, you can change the optimization target function if you want.
Is accuracy the only thing that I should be looking for? Of course there is overfitting, but can I quantify the predictive power of the model other than just CV accuracy? Some other measure or technique?

Accuracy is not the only concern. You should do careful validation (use techniques like K-fold cross validation) to test the generalization ability of your model. And be noted, for same accuracy, the fewer features you use, the better your model is.