Search

Using Regression SVM to solve Regression Problems

The SVM task trains a Support Vector Machine for regression. The SVM model uses a kernel function (a generalization of scalar product) to find the optimal separating surfaces in data.

The output of the task is a model, containing a weight matrix w_ji ,that can be employed by the Apply Model task to perform the SVM forecast on a set of examples.

Prerequisites

you must have created a flow;
the required datasets must have been imported into the flow;
the data used for the analysis must have been well prepared;
a unified model must have been created by merging all the datasets into the flow.

Additional tabs

The Monitor tab, where it is possible to view the temporal evolution of some quantities related to the SVM optimization during its execution. In particular, the behavior of tolerance, and its minimum is reported as a function of the number of iterations. These plots can be viewed during and after computation operations.
The Results tab, where statistics on the SVM computation are displayed, such as the execution time, number of attributes etc..

Procedure

Drag and drop the Regression SVM task onto the stage.
Connect a task, which contains the attributes from which you want to create the model, to the new task.
Double click the Regression SVM task.
Configure the options described in the table below.
Save and compute the task.

Parameter Name	Description
Regression SVM options
SVM formulation	Select the formulation for the SVM problem. Possible choices are: EPSILON_SVR: this formulation uses the epsilon parameter, through which it is possible to check the error percentage of the model. NU_SVC: this formulation uses the nu parameter to check the fraction of examples used as support vectors.
Gamma in kernel function	Specify the value of the parameter γ in the kernel function. Note this parameter is only required for Polynominal, Radial basis function and Sigmoid kernel functions.
Kernel function	Indicate the kernel function to be used. Possible choices are: Linear: K(x_{i ,}x_j) = x_i . x_j Polynominal: K(x_{i ,}x_j) = (γx_i . xj + C₀)^d Radial basis function: K(x_{i ,}x_j) = exp(-γ\|\|x_i - x_j\|\|²) Sigmoid: K(x_{i ,}x_j) = tanh(γx_i . xj + C₀)
Coef0 in kernel function	Specify the value of the parameter c₀ in the kernel function. Note this parameter is only required for Polynominal and Sigmoid kernel functions.
Degree in kernel function	Specify the value of the parameter d in the kernel function. Note this parameter is only required for Polynominal kernel functions.
Parameter C	Specify the value of the parameter C in the SVM formulation.
Normalization for input variables	The type of normalization to use when treating ordered (discrete or continuous) variables. Possible methods are: None: no normalization is performed (default) Normal: data are normalized according to the Gaussian distribution, where μ is the average of x and σ is its standard deviation: Minmax [0,1]: data are normalized to be comprised in the range [0,1]: Minmax [-1, 1]: data are normalized to be included in the range [-1, 1]: Every attribute can have its own value for this option, which can be set in the Data Manager task. These choices are preserved if Attribute is selected in the Normalization of input variables option; otherwise any selections made here overwrite previous selections made.
Parameter epsilon of epsilon-SVR	The ε parameter allows the user to control the number of possible errors in the training set: the higher the number, the higher the number of errors allowed.
Use shrinking heuristics	If selected, heuristic methods will be used to speed up computation.
Parameter nu of nu-SVC	Specify the value of the parameter ν in the nv-SVM formulation. The ν parameter, which must range from 0 to 1, is related to the fraction of examples used as support vectors.
Aggregate data before processing	If selected, identical patterns are aggregated and considered as a single pattern during the training phase.
Tolerance threshold	Specify the tolerance of the terminating criterion.
Append results	If selected, the results of this computation are appended to the dataset, otherwise they replace the results of previous computations.
Cache memory size	Specify the amount of cache that can be used during training.
Input attributes	Drag and drop the input attributes you want to use for the classification of data.
Output attributes	Drag and drop the attributes you want to use to form the final classes into which the dataset will be divided.