roc function in the Factory

The roc function calculates a performance indicator, the AUC, that is the area under the curve defined by the column and the attclass.

The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. All computation can be performed according to the groups defined in the group parameter.

The AUC value ranges from 0 to +1, where:

  • 0 indicates the worst performance indicator, and

  • +1 indicates the best performance indicator.

The roc curve is also available in the Sheets tab of the Data Manager.


Function and parameters

roc(column, attclass, defclass, group)

Parameter

Description

column

It identifies the column to which you want to apply the formula. The column parameter is mandatory.

attclass

It identifies the column to which you want to apply the formula. It is a nominal attribute. The attclass parameter is mandatory.

defclass

It is the default value for the class attribute.

group

It allows you to group the results by a certain column.

weights

It defines the importance of a certain attribute.


Example

The following example uses the Bike Sales dataset.

Description

Screenshot

  • In the example here, we want to retrieve the AUC value for the Order_Quantity attribute with the Country target.

  • The formula to write is: roc($"Order_Quantity",$"Country")

  • We can double check the result in the Statistics tab, by dragging the Order_Quantity attribute onto the Var_1 area and the Country attribute onto the Y area.

  • If we want to group our results, we can add the group parameter, that, if the defclass is not defined, must be written as group=$"att".

  • We want to group our results by the Customer_Gender attribute, so the formula becomes: roc($"Order_Quantity",$"Country",group=$"Customer_Gender")

  • The results are as follows:

    • The AUC value for the Male customers is 0.518.

    • The AUC value for the Female customers is 0.512.