Optimizing Rulesets

The Optimize Rulesets task allows you to modify and consequently improve the generation of predictive rules through a series of constraints.


Prerequisites

  • you must have created a flow;

  • the required datasets must have been imported into the flow;

  • a task has generated a ruleset in the flow.


Procedure

  1. Drag the Optimize Ruleset task onto the stage.

  2. Connect a task, which contains the ruleset you want to modify, to the new task.

  3. Double click the Optimize Ruleset task.

  4. In the Options tab, configure the options as described in the table below.

  5. Save and compute the task.

Optimize Ruleset Options

Parameter Name

Description

Maximum number of rules

Specify the overall maximum number of rules for the dataset. By default no limitations are imposed.

Maximum number of conditions

Specify the maximum number of conditions that any rule in the final dataset can contain.

A lower threshold can improve the readability of rules. By default no limitations are imposed.

Minimum precision value (%)

Specify the minimum percentage of precision that a rule must have.

Precision is defined as the ratio between the number of patterns of the correct class covered by the rule and the total number of covered patterns. 

A lower threshold can improve the quality of rules. By default no limitations are imposed.

Maximum rule error (%)

Specify the maximum error (in percentage) that a rule can score. The absolute or relative error is considered according to the whether the Consider relative error instead of absolute option is checked or not.

The error is defined as the ratio between the number of patterns belonging to an incorrect class, which are nonetheless covered by the rule and the total number of covered patterns.

A lower threshold can improve the accuracy of the rules. By default, no limitations are imposed.

Minimum covering value (%)

Specify the minimum percentage of covering that each rule in the final dataset must have.

Covering is defined as the fraction of patterns belonging to the correct class that are covered by the rule.

Minimum condition error (%)

Sets the minimum error value for each condition in each rule in the final dataset.

Aggregate data before processing

If selected, identical patterns are aggregated and considered as a single pattern during the training phase.

Consider relative error instead of absolute

Specify whether the relative or absolute error must be considered.

The Maximum error allowed for each rule is set by considering proportions of samples belonging to different classes. Imagine a scenario where for given rule pertaining to the specific output value yo:

  • TP is the number of true positives (samples with the output value yo that verify the conditions of the rule).

  • TN is the number of true negatives (samples with output values different from yo that do not verify the conditions of the rule).

  • FP is the number of false positives (samples with output values different from yo that do verify the conditions of the rule).

  • FN is the number of false negatives (samples with the output values yo that do not verify the conditions of the rule).

In this scenario the absolute error of that rule is FP/(TN+FP), whereas the relative error is obtained by dividing the absolute error by TP/(TP+FN) (samples with the output value yo that do verify the conditions of the rule).

If checked, the relative error is considered in evaluating the maximum rule error and the minimum condition error. For example, if 10% is the maximum error allowed for a rule, this means that the error cannot be more than the 10% of the covering of the rule.

Initialize random generator with seed

If selected, a seed, which defines the starting point in the sequence, is used during random generation operations. Consequently using the same seed each time will make each execution reproducible. Otherwise, each execution of the same task (with same options) may produce dissimilar results due to different random numbers being generated in some phases of the process.

Append results

If selected, the results of this computation are appended to the dataset, otherwise they replace the results of previous computations.

Results

The Results of the optimization are displayed in two separate tabs:

  • the Monitoring tab displays statistics of the selected rules as histograms during the execution of the optimization operation. In particular the distribution of the number of conditions, the covering and the error is showed. Rules relative to different classes are characterized by a a bar of a specific color.
    Monitoring plots remain available also at the end of the computation. Notice that, since optimization is performed in an incremental way, it is possible to stop the computation in any moment maintaining the last set of rules without loosing the coherence of the results. See the example below for further details.

  • the Results tab displays a summary of the performed computation, such as the execution time, number of rules before and after optimization etc.