entropy function in the Factory

The entropy function indicates a variable’s unpredictability. The higher the value, the higher the unpredictability.

It returns the entropy of the column, evaluated within groups defined by the group parameter if required.

The entropy value is included between 0 and 1, where:

  • 0 indicates low unpredictability, and

  • +1 indicates high unpredictability.

The entropy function is available also by:


Function and parameters

entropy(column, group, usemissing)

Parameter

Description

column

It identifies the column to which you want to apply the formula. The column parameter is mandatory.

group

It allows you to group the results by a certain column.

usemissing

A Boolean which indicates whether missing values should be considered or not in the computation of the statistics. The default value, if not otherwise specified, is True.


Example

The following example uses the Students Performance dataset.

Description

Screenshot

  • In the example here, we want to retrieve the Entropy of the lunch attribute’s values.

  • We write the following formula:

  • entropy($"lunch")

  • The Entropy value for the lunch attribute is 0.938.

  • If we want to be more precise with our analysis, we can add the group parameter.

  • For example, we want to group the results of our entropy by the gender attribute.

  • We write the following formula:

  • entropy($"lunch",$"gender")

  • The results will be:

    • For the female value, the entropy of the lunch attribute is 0.947.

    • For the male value, the entropy of the lunch attribute is 0.929.