Performing Moving Windows Statistics in Data

Moving window statistics define temporal windows of data of a specific size and shape. The data within the specific window can be statistically summarized, using operations such as minimum / maximum values, standard/absolute deviation, mean, median and mode.

For each selected variable the rows of the resulting dataset will include, beside the present value, some of its past values and some of its future values (possibly resulting from aggregation), according to user selection. The suffixes -1-2, etc. are added to the attribute name to denote past intervals, whereas the suffixes +1+2, etc. are added for future values. Time relationships are established by looking at a specified temporal attribute or by using the row number (possibly in reverse order).

The graphic displayed in the Time Legend tab helps understand the temporal positioning of the various configurable options:


Prerequisites


Procedure

  1. Drag the Moving Window task onto the stage and connect it to the task that contains the attributes you want to modify.

  2. Double click the Moving Window task. On the left hand side of the pane there is a list of all the available attributes in the dataset, which can be ordered and searched as required. 

  3. Select the attributes that you want to keep in the dataset in the left hand list (tapnames); unselected tasks will be removed from the dataset after computation. Attributes can be checked/unchecked one by one or in groups by pressing the Ctrl or SHIFT key. Right-clicking this panel gives access to some additional options, such as checking/unchecking all attributes, inverting your selection, hiding ignored attributes, etc.

  4. Configure the task options as described in the table below.

  5. Save and compute the task.

Moving Windows Basic options

Name

Description

Time attribute

Select a Time attribute from the drop-down list to specify the temporal attribute to be used to set the moving window. If not specified, row numbers will be used to establish the time relationship.

Key attributes for multiple time series

Drag and drop the attributes that will control the moving window process from the attributes list on the left. 

A separate application of the task is performed for any group of different values assumed by the key attributes.

Time unit

Specify the number of time units that determine a step back in the past or forward in the future (deltatime) and the unit of measure to be considered (timeunit). Possible values are second(s), minutes(s), hour(s), day(s), week(s), month(s), quarter(s), year(s).

(Past samples) Number of lags

Specify the number of past intervals for each variable that will be added to the resulting dataset.

(Past samples) Lag length

Specify the length of each past interval (expressed in time units).

(Future samples) Number of lags

Specify the number of future intervals for each variable that will be added to the resulting dataset.

Alternatively you can click the Copy past sample values button to use the same values specified for the past samples.

(Future samples) Lag length

Specify the length of each future interval (expressed in time units).

Alternatively you can click the Copy past sample values button to use the same values specified for the past samples.

Remove incomplete starting values

Select if you don't want any initial rows containing missing data to be included in the resulting dataset.
This may happen if the number of past intervals is not null, since not all the previous values will be available.

Fill missing values

Select if you want all the missing data in the resulting dataset to be filled with the nearest non-missing value in the right direction of time; in particular, past missing data are filled by checking subsequent values, whereas future data are filled by checking previous values.

Moving Windows Advanced options

Type of aggregation for ordered attributes

Select which type of aggregation you want to use for ordered attributes from the drop-down list. Possible values are Minimum (1), Maximum (2), Sum (3), Mean (4), Median (5), Mode (6), Standard deviation (7) or Absolute deviation (8).

Type of aggregation for nominal attributes

Select which type of aggregation you want to use for ordered attributes from the drop-down list. Possible values are Minimum (1) Maximum (2) or Mode (6)

Use ratio values

If selected, past and future intervals will not be absolute, but will correspond to their ratios, according to the current value.

Use trend values

If selected, past and future intervals will not be absolute, but will correspond to their differences, according to the current value.

Reverse the direction of time

Select if you are using row numbers to establish the time relationship (as you have not specified a specific time attribute), and you want time to flow in the opposite direction with respect to row numbers.

(Present samples) Number of time units for aggregation (current point)

Specify the number of time units you want to use to modify the current point.

If you leave the value 1 no aggregation operations will be performed.

(Past samples) First gap offset

Length of the gap between the current time period and the first period in the past (expressed in time units).

(Past samples) Number of time units for aggregation

Specify the number of time units you want to use for past aggregation. 

If you leave the value 1 no aggregation operations will be performed; any values greater than 1 mean that a rolling mean (or a rolling mode in case of nominal variables) is performed for each past value to be added to the resulting dataset; the inserted integer value gives the number of past time periods to be considered for the rolling mean. 

If a Time attribute is defined, the distance is measured according to the specified time unit and the value provided for the Number of time units for aggregation, otherwise it is simply given by the number of rows.

(Future samples) First gap offset

Length of the gap between the current time period and the first period in the future (expressed in time units).

Alternatively you can click the Copy past sample values button to use the same values specified for the past samples.

(Future samples) Number of time units for aggregation

Number of time units for future aggregation.

If you leave the value 1 no aggregation operations will be performed, any values greater than 1 mean that a rolling mean (or a rolling mode in case of nominal variables) is performed for each future value to be added to the resulting dataset; the inserted integer value gives the number of future time periods to be considered for the rolling mean.

Alternatively you can click the Copy past sample values button to use the same values specified for the past samples.

If a Time attribute is defined, the distance is measured according to the specified time unit and the value provided for the Number of time units for aggregation, otherwise it is simply given by the number of rows.


Example

The following example uses the Northwind dataset.

Description

Screenshot

  • After having imported the dataset, right click the import task and select Take a look to check the data have been correctly imported.

  • The original dataset is made up of 830 records and 14 attributes concerning orders, with various data types. Three of these attributes are dates that record the time of subsequent steps in processing orders. In particular, the ShippedDate variable provides the shipping date for each order.

  • Drag the Moving window task onto the stage and link it to the source task.

  • Set the parameters as follows:

    • Attributes subject to moving windows operationsfreight

    • Past samples - Number of Lags: 2

    • Future samples - Number of Lags: 1

    • Time attribute: ShippedDate

    • Time interval for aggregation: 1

    • Time measure unit: day(s)

Save and compute the task. Only the Freight attribute is affected by the moving window process, with 2 past intervals and one future interval, by using the ShippedDate attribute to determine time periods (with a lag of one day).

  • Right-click the task and select Take a look to visualize the resulting data.

  • Three additional attributes are added to the original dataset: 

    • Freight-2 contains the values for the past intervals

    • Freight-1 contains the more recent values for the past intervals 

    • Freight+1 includes the values for the future interval.

  • By sorting on the ShippedDate attribute the effect of the moving window appears clear, since values in columns Freight-2, Freight-1, Freight, and Freight+1 are shifted to the right when moving down, row after row.

  • Several values in the resulting dataset are missing since not all the dates are included in the ShippedDate column.

  • If you would prefer not to have missing values, and fill all the cells in the Freight-2Freight-1Freight, and Freight+1 attributes, select the Fill missing values option in the Moving Windows task and recompute.

The resulting table is as displayed here: