interquartile function

The interquartile function isolates outliers: for each data observation, it identifies whether it is in the interquartile deviation or not.

It returns the column with a binary True/False value according to the interquartile range. 

The coefficient (coeff) value is 1.5 by default. If another coefficient is needed, you can write it in the formula.

If the coefficient is:

  • <1, it is considered very restrictive.

  • >1, it is considered less restrictive.

If $"att" is in [Q1-coeff*(Q3-Q1), Q3+coeff*(Q3-Q1)] (where Q1 and Q3 are the first and the third quartiles, respectively, and coeff is a parameter fixed by the user), iniqr returns True, otherwise it returns False.


Function and parameters

inIqr(column, coeff)

Parameter

Description

column

It identifies the column to which you want to apply the formula. The column parameter is mandatory.

coeff

it is a factor fixed by the user to have a more restrictive or a less restrictive result.


Example

The following example uses the Bike sales dataset.

Description

Screenshot

  • In the example here, we want to retrieve the interquartile of the Profit attribute. We write the following formula:

  • inIqr($"Profit")

  • The formula returns True when the value is an outlier, otherwise a False is returned.

By default the coefficient is 1.5.

  • If we want to apply a different coefficient, we only need to specify it in the formula.

  • In the example here, we want it to be 0.5, so we write:

  • inIqr($"Profit",0.5)

The results have changed, according to the new coefficient.