Applying functions in the Data Manager

Through the Apply option, you can apply basic statistical functions, such as averages, entropy and variance to the attribute values in the selected columns to your attributes, based on their type.

There are several icons displayed in the Apply area, which are useful to understand before performing this operation:

  • The function type icon: it displays the acronym of the function currently applied to the corresponding attribute.

  • Make persistent: located on the right side of the query panel. Click on it to effectively apply all the query operations and permanently change the dataset.

  • Clear: located on the right side of the query panel. Click on it to clear the query table, removing all the created query operations, performed since the last time Make persistent was selected, and their effects to the dataset.


Prerequisites

  • You must have created a flow;

  • You must have linked the Data Manager to a task which contains the data to work on.


Procedure

  1. Drag the attributes onto the Apply area.

  2. Select the required function (see details on available functions in the table below the procedure).

  3. Click Apply.

  4. Click either:

    1. Make persistent button to save your results and make all the query operations permanent.
      The pre-filter cell is automatically cleared from the query operations performed, but the query operations are still applied to the table.

    2. Clear to remove query operations from the query panel: it cancels all the query operations that haven’t yet been made persistent.

  5. Save and compute the task.

Here is a list of the functions available in the Apply box: they depend on the attribute type.

 

Function

Purpose

Mode

It displays the most frequent value.

Mean

It displays the mathematical average of the values.

Number of distinct

It displays the number of non-coincident values.

Minimum

It displays the minimum value for the attribute (or the nominal attributes are put in alphabetical order A-Z).

Maximum

It displays the maximum value for the attribute (or the nominal attributes are put in alphabetical order Z-A).

Entropy

It measures the randomness of data:

  • 0 means that there is no variation in data.

  • 1 means that every value is different

Sum

It sums the values. Not available for nominal attributes.

Standard deviation

It measures the variability of values, by displaying how far values deviate from the mean average value. Not available for nominal values.

Variance

It’s the standard deviation’s quadratic value.

Median

It displays the median attribute value (middle). E.g.: 1; 2; 3; 4; 5; 3 is the median value. Not available for nominal attributes.

Count

It counts the values in the attribute.


Example

The following example uses the Supermarket_sales dataset.

Step

Screenshot

Drag the attribute onto the Apply area.

  • Select the required function, then click Apply.

  • In the example here, we decided to apply the mean function to the gross income attribute of the table.

Click Make Persistent and save and compute the task.