Question

One-Class Libsvm Classifiers

1

Entering edit mode

14.5 years ago

Panos ★ 1.8k

I'm trying to classify short reads to a number of bins (usually no more than 5). After looking for a while in the libSVM faq as well as in relevant papers, I think that one-class SVM classifiers may be what I'm looking for; I just need to know whether a read belongs to a bin or not. This is what a one-class classifier will tell me, right?

The problem is that after preparing the training set (have tried 500 and 1000 vectors) and doing the testing, classification accuracy can't get above 32% (lots of false negatives).

I noticed that there's a one-class-specific parameter, named "nu" (-n switch in svm-train). I wrote a Perl script and tried different values for it (from 0.001 to 1 in 0.001 steps) but can't get a decent accuracy...

Has anyone more experience with such classifiers and give me some hints, please?

metagenomics short • 7.8k views

ADD COMMENT • link updated 14.5 years ago by Casbon ★ 3.3k • written 14.5 years ago by Panos ★ 1.8k

4

Entering edit mode

Are you trying to predict a given short read is a part of any of the 5 bins or you already have different models based on different bins ? I think then this could be a multi-svm problem than a one-class libSVM. Single-class svm can be used only for problems based on two classes (a or b).

ADD REPLY • link 14.5 years ago by Khader Shameer 18k

3

Entering edit mode

panos, what prevents you from creating a 6th bin of "unclassified"? also, is it possible for a read to belong to two bins? which is going to happen if you go with 5 single-class SVMs.

ADD REPLY • link 14.5 years ago by Haibao Tang 3.0k

2

Entering edit mode

Can you clarify what the "bins" are? and may be a preview of what the data looks like (features, classes..)? in some problem instances, it is not easily separable by hyperplanes.

ADD REPLY • link 14.5 years ago by Haibao Tang 3.0k

0

Entering edit mode

I'm trying to predict whether a read belongs to any of 5 predefined bins. I think that it would be better to go with 5 single-class SVMs rather than one 5-class because I have the impression that multi-class SVMs would only classify a read to one of the 5 bins; it wouldn't "consider" the possibility that a read could not be a member of any of the 5 bins (i.e. leave it unclassified). Am I right?

I think, though, that in the case of multi-class SVMs, I can calculate probabilities for predictions. Is this a way for telling whether a given read cannot be classified into any of the specified bins?

ADD REPLY • link 14.5 years ago by Panos ★ 1.8k

0

Entering edit mode

I can't create such a 6th bin because my other 5 bins would be representing the dominant bacteria in my sample. This would mean that this 6th class would have represent EVERY other bacterium... No, no read can belong to two (or more) bins. I haven't thought about it! Good point! Do you think that taking the probability of the prediction (-b switch in svm-predict) into account could help me decide whether assigning a given read to some bin, is significant?

ADD REPLY • link 14.5 years ago by Panos ★ 1.8k

0

Entering edit mode

Often with multi-class SVM classifiers the class with the highest score is picked as the output class. It sounds like what you want is to only allow classifications where the highest score is also greater than some threshold. All other data would be classified as "unclassifiable". That may not be available in existing packages, but it would alleviate some of the difficulties of using 5 single-class SVMs.

ADD REPLY • link 14.2 years ago by Mrawlins ▴ 430

score 1 · Answer 1 · 2010-12-09

I've messed around with SVMs, but need more info on how you are encoding the features. You have short reads, so you could be using k-mers counts to encode your read or you could be using some kind of string kernel. Anyway, depending on your features you also need to choose a kernel that is appropriate.

You can use one class classifiers to choose class membership to choose class membership, and assign some probability based on distance from the separating hyperplane.

We really need to know more about the problem, though. How is class membership defined? It may be better to use edit distance to the cluster center, or something similar.