median function

The median is the middle value in a list of values within an attribute.

For example, in the series 5 - 10 - 77 - 320 - 1, the median is 77.

But if the series is made up of an even number of values, the median is the mean of the two middle values. (e.g. in the series 1 - 2 - 3 - 4, the median is 2.5, that is the mean of 2 and 3, the middle values).

The median is different from the mean. In fact, the mean corresponds to the mathematical average value of an attribute.


Function and parameters

median(column, group)

Parameter

Description

column

It identifies the column to which you want to apply the formula. The column parameter is mandatory.

group

It allows you to group the results by a certain column.


Example

The following example uses the Students Performance dataset.

Description

Screenshot

  • In the example here, we want to retrieve the median value of the math score attribute. We type the following formula:

  • median($"math score")

The median of the math score attribute is 66.

  • Then, we want to group our results by the lunch attribute, so the formula will be:

  • median($"math score",$"gender")

  • The results are as follows:

    • The median value of the math score for female students is 65.

    • The median value of the math score for male students is 69.