, , , , , , ,

During my research work on pattern recognition, i came across the task of choosing the right sigma and C value for the RBF SVMs i was going to use which led me to this topic as  training RBF kernel based SVMs we need two variables, Sigma and C along with the training data. here are the problems and solutions that i found.

while SVM is a binary classifier it can also be used for more than two classes, for example, by using a maximum voting scheme on all possible binary combination of the classes, or using ‘one versus all’ SVM for each of the class and determining the probabilistic superiority of one class to the others (probability measurement is needed in this case). now the question arises when we need to choose the sigma and C values for RBF SVMs. firstly on what basis do we choose them, secondly when using SVM for multiple classes do we choose the same sigma and C value for all the binary classifiers included in the multiple classification process, thirdly the relation between accuracy rate and sigma and C value.

choosing a good sigma and C value is very essential for good accuracy. for that a common practice is to do a grid search using different value pairs of sigma and C. for example we can do a two layer grid search. in the first layer we do a parse grid search where C values are in the set {10^-3, 10^-2,….10^3 } and sigma^2 in the set {10^-3, 10^-2,….10^3 }. so we will have a grid with 49 elements and we take an optimal value pair of SIGMAo and Co by testing and comparing the accuracy rates of the corresponding experimental SVMs . then we do a fine grid search for the C values  {.2Co, .4Co, … Co, 2Co,4Co..8Co} and sigma values {.2*SIGMAo, .4*SIGMAo…, 1*SIGMAo, 2*SIGMAo, 4*SIGMAo, .., 8*SIGMA0} find the optimal value pair. here by optimal pair of sigma and C we mean the pair which has comparatively smaller error rate, higher sigma value and lower C value.

Also we  have to keep in mind that a small error rate in training data is useless unless we use cross validation ( dividing training data in n sub divisions and testing each division using the classifier created using the n-1 sub divisions).

some notable properties between C and Sigma are :

  1. with low value of C (<1) error rate increases with higher value of sigma.
  2. large values of C counter balance the bias introduced by large sigma
  3. very small value of sigma may result in good training data error rate but wont be useful in the case of test data recognition error rate.
  4. with large value of sigma the Gaussian kernel of RBF becomes almost linear.
  5. a stable(where error rate is almost non fluctuating) region of sigma and C values should be searched (from graph if possible) for better test data recognition accuracy.
  6. number of SVs is not a reliable way of determining the ‘goodness’ of the classifier.

now about the question if the sigma and C should be same for all the binary classifiers which take part in the multi-class classification. from my study i found two ways of dealing with this problem. from Hsu and Lin we come to know that we need to use the same sigma and C value for all the binary classification. Another way of doing is (suggested by Duan and Keerthi) to choose individual sigma and C value of each binary classifier which minimizes the generalization error of that binary classification. so both ways are accepted and are supposed to give similar results.

Just a little note here for svm training part, the data should be scaled before training and testing. This can make a big difference sometimes and recently i came accross such a situation.


Nayef Reza