Computing Formulas in the Data Manager

The main Data tab in the Data Manager task includes a formula bar, where you can define attributes by using columns, constants or functions in formulas.

The formula bar is divided into two distinct parts:

  • left-hand side member: corresponds to the attribute which will be defined by the formula. 

  • right-hand side member: contains the formula to define the new attribute.

Values can be entered manually, selected from the corresponding drop-down list or inserted by clicking on a cell in the spreadsheet. When you click on a cell in the spreadsheet, the corresponding attribute is added in the formula, with the format $"attribute_name". Functions and operators can also be used to define the formula.

Process variables can also be used in formulas. They are recognized by the "@" prefix. For example, @Today could be a process variable that contains the current date. For more information on process variables see Using Process Variables in the Factory.

 

When typing formulas, the complete formula with the parameters is displayed under the formula bar with a MONACO editor.


Procedure

  1. In the formula bar, in the left-hand side member, enter the attribute you will be defining in one of the following ways:

    • select an existing attribute from a drop-down menu;

    • click on a cell in the spreadsheet.

  2. Click the function button and select the function required to create the formula. For a list of functions provided by Rulex, see Formulas and Functions in the Factory.

  3. In the right-hand side member, create the formula which will define the attribute by:

    • entering attributes

    • entering constants.

  4. Press enter or click on the apply button.

Selecting multiple attributes

  • If you need to select more than one attribute at a time, you can use either press Ctrl and select all the required attributes: ($"att1", $"att5", $"att11")

  • If you need to select a range of attributes, you can either press Shift and select the first and last attribute in the range, or select them all with the cursor. The formula bar will include a colon to indicate the range: $"att1":$"att5"

Examples of multiple selection syntax:

  • $"att_sum" = sum($"att1",$"att5", $"att11")

  • $"att_sum" = sum($"att1":$"att5")

 


Bulk editing attributes' headers with the $names parameter

If you need to edit the attributes' headers, willing to apply them a certain naming rule, you can specify the $names parameter in the left-hand side member of the formula bar.

This way, you can write a function which applies to all attributes, it doesn’t matter how many they are.

For example, if you need to change a separator in the attribute’s headers, you can combine different Rulex functions together.

In this example, the ifelse, find and replace functions have been combined.

In the Turkish calendar dataset, the "_" header separator must be switched to "-" . Follow the steps listed below:

  • In the formula bar, write the $names in the left-hand side member.

  • In the right-hand side member, type the following function: ifelse(find($names,"_")==1, replace($names,"_","-"), $names).

  • Press Enter or click on the green tick at the right of the formula bar. The results are shown below:


The $null parameter when there isn’t any target column

If you need to use GOLD (Rulex proprietary language) functions in the Data Manager task, or you simply need to perform actions on the dataset, you can specify the $null parameter in the left-hand side member of the formula bar.

It is highly recommended to use this parameter within the following use cases:

  • when using GOLD functions, like warningGold(): they allow users to create and customize popup messages, displayed at the bottom of the screen when certain conditions are satisfied, without the need to browse through the Data Manager’s History.

  • when working on an unbalanced dataset: if an user is performing a data analysis on an unbalanced dataset and an oversampling of the least frequent occurrences is needed, the $null can be used along with a function (such as the multiplyRows("col_int") function). As new rows containing values have been added, a dataset-wide edit operation has been performed in the Data Manager.