Question

Classification with SVM

0

Entering edit mode

3.0 years ago

ali ▴ 20

Guys I'm trying to replicate a work for exercise. In this work I have obtained significant genes and samples(patients). So in the classification I use the count matrix and as features I use the genes and as Samples the patients that have to be classified as Tumor or Non-Tumor. But I'm really confused. The book says :

For the multi-class SVM classification algorithm, a One-Versus-One (OVO) approach was used. To cross validate the algorithm for all samples in the training cohort, the SVM algorithm was trained by all samples in the training cohort minus one, while the remaining sample was used for (blind) classification. This process was repeated for all samples until each sample was predicted once (leave-one-out cross-validation [LOOCV] procedure).

I don't really see how can I use OVO with LOOCV. I know how to use LOOCV on Python on top of SVM as I know how to use OVO on Python on top of SVM but I don't see how to use both. I might be ignorant , that is why I ask here, do someone know what do they mean?

SVM Python Classification LOOCV • 930 views

ADD COMMENT • link 3.0 years ago by ali ▴ 20

score 2 · Answer 1 · 2022-06-11

2

Entering edit mode

3.0 years ago

Mensur Dlakic ★ 29k

For the multi-class SVM classification algorithm, a One-Versus-One (OVO) approach was used.

SVM is a binary classifier, so it doesn't natively classify multiple classes. It has to split a multiclass problem into multiple independent binary classification tasks. There is a one-versus-rest (OVR) method which is commonly used, but I don't know what OVO is. It really shouldn't matter for solving your problem because you have a common binary classification (tumor vs. non-tumor).

LOOCV is used for a small number of samples (genes), which in your case is unknown to us. There is nothing special to it. Let's say you have 25 genes, in which case this becomes the same as k-fold cross-validation where k=25 and each sample is drawn into validation exactly once. Most SVM implementations I know about do not natively support LOOCV, so you may have to set your folds manually.

ADD COMMENT • link 3.0 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

Nice answear, an Ovo approach is an One-Vs-One approach so it kinda sees all differences between classes one by one. I actually forgot that I need this approach for a multi calss too so for instance :

Tumor Classes : GBM , BrCa, PAAD,Lung,HBC,CRC
Healthy Class : HC.

Given this 7 classes I have to predict to which the sample belongs. I have 285 samples and 2519 genes. So In that case would an approach like the one mentioned below do the work?

Training :

def loocv(train_X,train_y):
  # define X and y
  X = train_X
  y = train_y
  # define LOOCV
  loo = LeaveOneOut()
  loo.get_n_splits(X)
  # define true and predict list
  y_true,y_pred = [],[]
  # run 
  for train_index, test_index in loo.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    model = SVC(kernel='linear',random_state=0) 
    ovo_classifier = OneVsOneClassifier(model)
    ovo_classifier.fit(X_train,y_train)
    yhat = ovo_classifier.predict(X_test)
    y_true.append(y_test[0])
    y_pred.append(yhat[0])
  return y_true,y_pred,ovo_classifier

Validation :

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
pred_y = model.predict(X_test)
training_accuracy = accuracy_score(y_true,y_pred)
accuracy = accuracy_score(y_test,pred_y)
return(accuracy,training_accuracy)

Result :

0.6767441860465118
0.6713567839195979

ADD REPLY • link 3.0 years ago by ali ▴ 20