Splitting Data into Training and Test Sets

Before building a predictive model it is recommended that you split the dataset into subsets.

Splitting the dataset allows us to use a dataset to create our predictive models and then immediately test the validity of these models on different data.

The following datasets can be created:

  • the training set, used to identify patterns in the data and build the model,

  • the test set, used to assess the accuracy of the model and

  • the optional validation set, which can be used for tuning the model parameters.


Splitting methods

There are two distinct tasks for splitting datasets in Rulex:

 

Task name

Description

Corresponding page

Split Data

Splits the dataset randomly or sequentially.

Splitting Data with the Split Data Task

Data Manager

Splits datasets according to specified criteria.

Splitting Data with the Data Manager