gini function

The gini function returns the Gini index of the column, evaluated within groups defined by the group parameter if required.

The Gini index is a measure of statistical dispersion.

The Gini index is included between 0 and +1, where:

  • 0 indicates perfect equality between values, and

  • +1 indicates maximum inequality between values.


Function and parameters

gini(column, group, usemissing)

Parameter

Description

column

It identifies the column to which you want to apply the formula. The column parameter is mandatory.

group

It allows you to group the results by a certain column.

usemissing

A Boolean which indicates whether missing values should be considered or not in the computation of the statistics. The default value, if not otherwise specified, is True.


Example

The following example uses the Adult dataset.

Description

Screenshot

  • In the example here, we want to calculate the Gini index of the Capital_Gain attribute.

  • The formula to write is: gini($"capital-gain")

  • If we want to be more precise with our analysis, we can add the group parameter. In the example here, we want to group our results by the occupation attribute.

  • So, the formula to write will be: gini($"capital-gain",$"occupation")

  • The results are as follows:

    • The Gini index for Adm-clericals is 0.112;

    • The Gini index for Exec-managerial is 0.191;

    • The Gini index for Handlers-cleaners is 0.094,
      and so on for all the other values of the occupation attribute.