Using SVM to solve Classification Problems
The SVM task trains a Support Vector Machine for classification. The SVM model uses a kernel function (a generalization of scalar product) to find the optimal separating surfaces in data.
The output of the task is a model, containing a weight matrix wji ,that can be employed by the Apply Model task to perform the SVM forecast on a set of examples.
Prerequisites
you must have created a flow;
the required datasets must have been imported into the flow;
the data used for the analysis must have been well prepared;
a unified model must have been created by merging all the datasets into the flow.
Additional tabs
The Monitor tab, where it is possible to view the temporal evolution of some quantities related to the SVM optimization during its execution. In particular, the behavior of tolerance, and its minimum is reported as a function of the number of iterations. These plots can be viewed during and after computation operations.
The Results tab, where statistics on the SVM computation are displayed, such as the execution time, number of attributes etc..
Procedure
Drag and drop the SVM task onto the stage.
Connect a task, which contains the attributes from which you want to create the model, to the new task.
Double click the SVM task.
Configure the options described in the table below.
Save and compute the task.
SVM basic options | |
Parameter Name | Description |
---|---|
Input attributes | Drag and drop the input attributes you want to use to build the network. |
Output attributes | Drag and drop the attributes you want to use to build the model. |
SVM formulation | Select the formulation for the SVM problem. Possible choices are:
|
Degree in kernel function | Specify the value of the parameter d in the kernel function. Note this parameter is only required for Polynominal kernel functions. |
Kernel function | Indicate the kernel function to be used. Possible choices are:
|
Gamma in kernel function | Specify the value of the parameter γ in the kernel function. Note this parameter is only required for Polynominal, Radial basis function and Sigmoid kernel functions. |
Normalization for input variables | The type of normalization to use when treating ordered (discrete or continuous) variables. Every attribute can have its own value for this option, which can be set in the Data Manager task. These choices are preserved if Attribute is selected in the Normalization of input variables option; otherwise any selections made here overwrite previous selections made. |
Coef0 in kernel function | Specify the value of the parameter c0 in the kernel function. Note this parameter is only required for Polynominal and Sigmoid kernel functions. |
SVM advanced options | |
Parameter Name | Description |
---|---|
Parameter C of C-SVC | Specify the value of the parameter C in the SVM formulation. |
Tolerance threshold | Specify the tolerance of the terminating criterion. |
Parameter nu of nu-SVC | Specify the value of the parameter ν in the nu-SVM formulation. |
Cache memory size | Specify the amount of cache that can be used during training. |
Use shrinking heuristics | If selected, heuristic methods will be used to speed up computation. |
Append results | If selected, the results of this computation are appended to the dataset, otherwise they replace the results of previous computations. |
Aggregate data before processing | If selected, identical patterns are aggregated and considered as a single pattern during the training phase. |
Example
The following example uses the Adult dataset.
Description | Screenshot |
---|---|
After having imported the file with an Import from Text File task and after having split the dataset into training (70%) and test (30%) sets with the Split Data task, drag a SVM task onto the stage. Open it and drag the Income attribute onto the Output area, then drag the following attributes in the Input area:
Configure these options as follows:
Then, open the Advanced tab and configure the following option:
Leave the remaining default settings, then save and compute the task. |
|
The execution of the SVM task can be viewed in the Monitor tab. In these plots the behavior of the tolerance (and its minimum) as a function of the iteration is shown. | |
The forecast ability of the set of generated rules can be viewed by adding an Apply Model task to the SVM task, and computing with default options. | |
The forecast produced by the Apply Model task can be analyzed by right-clicking the task and selecting Take a look. In the data table the following columns relative to the results of SVM elaboration have been added:
|