fillLinear function in the Factory

The fillLinear function fills any missing values for a specified attribute with a value, based on the other values present in the attribute, using a linear interpolation method.

The linear interpolation links two adjacent values with a straight line, graphically: this means that the gap is filled with values with equal distance one to the other.


Parameters

fillLinear(column, group)

Parameter

Description

column

The attribute whose missing values will be filled with an average value. The column parameter is mandatory.

group

The attribute by which you want to further group results.


Example - fillLinear(column)

The following examples use the Bike sales dataset, from which we have removed some values from the Day attribute.

Description

Screenshot

In the first example, we have a series of missing values in the Unit_Cost attribute in our dataset, which we want to fill.

Here we have inserted a formula, whereby each missing value in the Unit_Cost attribute is filled with an appropriate value, based on the linear interpolation of the other values present in the Unit_Cost attribute.

The formula to enter in this case is: fillLinear($"Unit_Cost").

The results are displayed as continuous values.

The first missing value has been filled with the value 40, while the second missing value has been filled with the value 38.


Example - fillLinear(column, group)

Description

Screenshot

In the second example, we will again configure the Unit_Cost attribute to fill its missing values, but this time the new values will be calculated based also on the corresponding Product attribute, as specified as the group parameter, with the formula fillLinear($"Unit_Cost", $"Product").

The two values entered are 42 and 38, which are actually more precise (the costs should have been exactly 42 and 38).