Analyzing Model Performance in the Confusion Matrix

The Confusion Matrix computes and visualizes the performance of any classification method.

Each column of the matrix represents the patterns in a predicted class, while each row represents the instances in an actual class. 



  1. Drag the Confusion Matrix task onto the stage.

  2. Connect a task, which contains the output and prediction columns, to the new task.

  3. Double click the Confusion Matrix task. The model performance is displayed, as explained in the Results section below.

  4. Change the output you want to display from the Output drop-down list.

  5. Change the column of the forecast output from the Prevision list.

  6. Change the dataset whose results you want to display from the Display matrix for drop-drop list. Possible options are Training set, Test set or All.

  7. Select Show percentage to display the percentage in the table in parentheses.


The Forecast pane displays the results in in two different matrices:

  • the numerical matrix represents the percentages of samples that were correctly and incorrectly predicted in a grid.

  • the graphical matrix represents the same information in an easy to read grid, where each column of the matrix represents the patterns in a predicted class, while each row represents the instances in an actual class:


The following example uses the Adult dataset.

The scenario aims to solve a simple classification problem based on ranges on income. 

We will use the Confusion Matrix block to evaluate how the forecast output differs from the actual one.

Note that the Confusion Matrix is independent from how the forecast has been generated.



After having:

  • imported the dataset

  • split it into training (60%), test (20%) and validation (20%) sets with the Split Data task

  • computed a LLM Classification with the Income attribute as Output

  • applied the model with the Apply Model task leaving the default options

Add a Confusion Matrix task and link it to the Apply Model task.

The confusion matrix for the test set shows that the majority of errors derive from misclassification among class >50K and <=50K.

In other words there are few cases of class <=50K classified as >50K but many examples of class >50K classified as <=50K.

Note that in a two-class problem, the confusion matrix may appear trivial, but if more classes are present the information contained in the matrix may help understand the studied phenomenon and improve the classification accuracy.