uSVM-TK

uSVM-TK 2.0 is the second release of uSVM-TK package [1] that accompanies the paper submission "Large-Scale Kernel Methods in Question Answering Tasks" and is an extension of a Cutting Plane Algorithm (CPA) with sampling described by Yu and Joachims in [2] to train non-linear SVMs with structural kernels for binary classification. It offers an effective and sound method for tuning up Precision and Recall on imbalanced datasets via alternative sampling strategy.

This software reuses parts of the code from svm_sampled_cuts (developed by Chun-Nam Yu ) and the implementation of Tree Kernels from SVM-light-TK [4] by Alessandro Moschitti. The former comes with SFMT Mersenne Twister RNG library for random number generation. uSVM-TK is implemented in C and unlike the previous relase uses LIBQP library as the QP solver [3], which results in much easier installation. It has been tested under 64-bit Mac OS X and under 32-bit and 64-bit Linux.

Install

Modify the compiler (CC) and compiler flags (CFLAGS) in the Makefile to reflect your platform-specific configuration (set to gcc by default). Compile the code by typing 'make' in the command line, which completes the installation.

Options

The usage is very similar to that of SVM-light-TK. The important options are described below:

-q: controls the sampling size to approximate the cutting plane introduced at each iteration (computes the approximate subgradient). Normally, if you want to train your model very fast or perform fast model and kernel parameter selection you can set q to as small as 100 (look for more details in the paper [3]). Set a larger q up to 1000 when you want to obtain a more accurate model. (In my experiments on datasets with more than 1 million examples q = 4000 delivers the same accuracy as SVM-light-TK yet the training is 10 times as fast).

-c: margin trade-off (default value is fixed at 1.0).

-n: the maximum number of iterations. I typically set n = 1000. However, if you use a small sampling size, you might need to increase this number, as it may take longer for the algorithm to converge to the optimal solution.

-j: controls the bias to reject negative examples during sampling to compute the most violated constraint (more on this in the upcoming paper!). Allows one to tune Precision/Recall on highly imbalanced datasets. Setting the value of "-j" parameter equal to the ratio between negative and positive examples in the training set means that we change the ratio between positive and negative examples inside each sample as 1:1. You may want to experiment with this value to get the desired balance between Precision/Recall. It is important to note that for SVM-light "-j" parameter has slightly different meaning. For the former it , while for the latter it defines the factor by which training errors on positive examples outweight errors on negative. The default value is set to 1.0, i.e. examples from different classes have equal weights.

Train

As an example, the command to train the SVM using Tree Kernel on a QC dataset residing in the same directory would look like this:

./svm_uniform_sampling -t 5 -c 1.0 -q 250 -n 1000 QC/dat/train.dat model

here the margin parameter is set to 1.0; the sampling size is set to 250; the maximum number of iterations is set to 1000; -t 5 selects the tree kernel.

For more details on experimenting with different tree kernels and their combinations, please, refer to the description page of SVM-light-TK.

NOTE: the CPA algorithm begins to demonstrate its superior performance on datasets of the size starting at hundred thousand examples (at this point quadratic scaling behavior of conventional decomposition methods, such as used in SVM-light, becomes noticeable (if you use kernels that are very expensive to compute, you might notice the speedups at a level of tens of thousands examples. In my experiments with tree kernels this normally happens after the dataset is over 30-50k).

Classify

The algorithm produces the same model file that you can use with svm_classify algorithm of SVM-light-TK (also included in the package).

For example, to run classification of the learned model on a test set:

./svm_classify QC/dat/DESC_test.dat model

Download

Author: Aliaksei Severyn
Email: severyn -AT- disi -DOT- unitn -DOT- it

If you have any feedback, feel free to contact me. I will answer as soon as I get time. Please, let me know if you plan to publish a paper using this software.

uSVMTK 2.0

References

[1] Aliaksei Severyn, Alessandro Moschitti: Large-Scale Support Vector Learning with Structural Kernels. ECML/PKDD (3) 2010: 229-244

[2] Chun-Nam John Yu, Thorsten Joachims: Training structural svms with kernels using sampled cuts. KDD 2008: 794-802

[3] R.-E. Fan, P.-H. Chen, C.-J. Lin. Working Set Selection Using Second Order Information for Training SVM. JMLR. vol 6. 2005.

[4] Alessandro Moschitti. 2006. Making tree kernels practical for natural language learning. In EACL. The Association for Computer Linguistics.

License

The software is licensed under the GPL version 3 (cf. LICENSE).