movMean function

The moving mean (also known as moving average) analyzes data points by creating a series of averages of different subsets of the chosen column.

Returns the moving average of the column, evaluated on the lag continuous rows, computed according to groups defined by the group parameter if required.

Each row represents the first value of the lags in which the column will be divided.


Function and parameters

movMean(column, lag, group, front)

Parameter

Description

column

It identifies the column to which you want to apply the formula. The column parameter is mandatory.

lag

It is the number of continuous rows used to evaluate the moving average. The lag parameter is mandatory.

group

It allows you to group the results by a certain column. This is an optional parameter.

front

it is an optional parameter. If not specified,

  • True is the default value, so the evaluation of the movMean goes from the current row onwards.

  • False is specified, it means that the lag is negative, so the evaluation of the movMean goes from the current row backwards.


Example

The following example uses the HR-employee-attrition dataset.

Description

Screenshot

  • In the example here, we decided to calculate the moving mean of the Years at Company attribute, with a lag of 20 rows.

  • We write the following formula:

  • movMean($"YearsAtCompany",20)

  • If we want to go on with our analysis we can add a group parameter, by which our results will be grouped.

  • We decided to group the values by the Job Role attribute, so the formula becomes:

  • movMean($"YearsAtCompany",20,$"JobRole")

  • The results are as follows:

    • For the Sales Executive job role, the moving mean of the Years at Company attribute is 9.450.

    • For the Research Scientist job role, the moving mean of the Years at Company attribute is 4.750.