GenSVM¶

class gensvm.core.GenSVM(p=1.0, lmd=1e-05, kappa=0.0, epsilon=1e-06, weights='unit', kernel='linear', gamma='auto', coef=1.0, degree=2.0, kernel_eigen_cutoff=1e-08, verbose=0, random_state=None, max_iter=100000000.0)

Generalized Multiclass Support Vector Machine Classification.

This class implements the basic GenSVM classifier. GenSVM is a generalized multiclass SVM which is flexible in the weighting of misclassification errors. It is this flexibility that makes it perform well on diverse datasets.

The fit() and predict() methods of this class use the GenSVM C library for the actual computations.

Parameters:

p (float, optional (default=1.0)) – Parameter for the L_p norm of the loss function (1.0 <= p <= 2.0)
lmd (float, optional (default=1e-5)) – Parameter for the regularization term of the loss function (lmd > 0)
kappa (float, optional (default=0.0)) – Parameter for the hinge function in the loss function (kappa > -1.0)
weights (string, optional (default='unit')) –
Type of sample weights to use. Options are ‘unit’ for unit weights and ‘group’ for group size correction weights (equation 4 in the paper).

It is also possible to provide an explicit vector of sample weights through the fit() method. If so, it will override the setting provided here.
kernel (string, optional (default='linear')) – Specify the kernel type to use in the classifier. It must be one of ‘linear’, ‘poly’, ‘rbf’, or ‘sigmoid’.
gamma (float, optional (default='auto')) – Kernel parameter for the rbf, poly, and sigmoid kernel. If gamma is ‘auto’ then 1/n_features will be used. See Kernels in GenSVM for the exact implementation of the kernels.
coef (float, optional (default=1.0)) – Kernel parameter for the poly and sigmoid kernel. See Kernels in GenSVM for the exact implementation of the kernels.
degree (float, optional (default=2.0)) – Kernel parameter for the poly kernel. See Kernels in GenSVM for the exact implementation of the kernels.
kernel_eigen_cutoff (float, optional (default=1e-8)) – Cutoff point for the reduced eigendecomposition used with nonlinear GenSVM. Eigenvectors for which the ratio between their corresponding eigenvalue and the largest eigenvalue is smaller than the cutoff will be dropped.
verbose (int, (default=0)) – Enable verbose output
random_state (None, int, instance of RandomState) – The seed for the random number generation used for initialization where necessary. See the documentation of sklearn.utils.check_random_state for more info.
max_iter (int, (default=1e8)) – The maximum number of iterations to be run.

coef_¶: array, shape = [n_features, n_classes-1] – Weights assigned to the features (coefficients in the primal problem)

intercept_¶: array, shape = [n_classes-1] – Constants in the decision function

combined_coef_¶: array, shape = [n_features+1, n_classes-1] – Combined weights matrix for the seed_V parameter to the fit method

n_iter_¶: int – The number of iterations that were run during training.

n_support_¶: int – The number of support vectors that were found

SVs_¶: array, shape = [n_observations, ] – Index vector that marks the support vectors (1 = SV, 0 = no SV)

See also

GenSVMGridSearchCV:: Helper class to run an efficient grid search for GenSVM.

fit(X, y, sample_weight=None, seed_V=None)

Fit the GenSVM model on the given data

The model can be fit with or without a seed matrix (seed_V). This can be used to provide warm starts for the algorithm.

Parameters:	X (array, shape = (n_observations, n_features)) – The input data. It is expected that only numeric data is given. y (array, shape = (n_observations, )) – The label vector, labels can be numbers or strings. sample_weight (array, shape = (n_observations, )) – Array of weights that are assigned to individual samples. If not provided, then the weight specification in the constructor is used (‘unit’ or ‘group’). seed_V (array, shape = (n_features+1, n_classes-1), optional) – Seed coefficient array to use as a warm start for the optimization. It can for instance be the `combined_coef_` attribute of a different GenSVM model. This is only supported for the linear kernel. NOTE: the size of the seed_V matrix is n_features+1 by n_classes - 1. The number of columns of seed_V is leading for the number of classes in the model. For example, if y contains 3 different classes and seed_V has 3 columns, we assume that there are actually 4 classes in the problem but one class is just represented in this training data. This can be useful for problems were a certain class has only a few samples.
Returns:	self – Returns self.
Return type:	object

predict(X, trainX=None)

Predict the class labels on the given data

Parameters:	X (array, shape = [n_test_samples, n_features]) – Data for which to predict the labels trainX (array, shape = [n_train_samples, n_features]) – Only for nonlinear prediction with kernels: the training data used to train the model.
Returns:	y_pred – Predicted class labels of the data in X.
Return type:	array, shape = (n_samples, )