Formulas and Functions in the Factory
Formulas and functions allow you to enrich your dataset by creating new attributes, and retrieve information on your attributes.
Function categories
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
enum(group) | Enumerates the rows in a dataset, in ascending order, optionally grouped by a selected attribute. | |
fillDown(column, group, fillall) | Returns a copy of the specified column, filling all the missing values with the last valid value, optionally grouped by selected attributes. If there are rows which remain empty, as they do not have a previous valid value, a fillUp operation can then be performed, using the next valid value. | |
fillLinear(column, group) | Fills any missing values for a specified attribute with a value, based on the other values present in the attribute, using a linear interpolation method. The linear interpolation links two adjacent values with a straight line, graphically: this means that the gap is filled with values with equal distance one to the other. | |
fillLinear(column, group) | Returns a copy of the selected column, filling all the missing values with the next valid value, optionally grouped by selected attributes. If there are rows which remain empty, as they do not have a next valid value, a fillDown operation can then be performed, using the previous valid value. | |
integer(column) | Returns the values of the specified attribute, cast to the integer data type. | |
len(column) | Returns the number of values present in the attribute, including missing values. | |
matchHeaders(column) | Returns the value in the corresponding row for the selected column. | |
nominal(column) | Returns the values of the specified attribute cast to the nominal data type. | |
perm(column) | Returns the values of a selected attribute in a random order. | |
rank(column, usemissing) | Returns the values of a column in ascending alphabetic order. By default missing values are considered in the ranking. | |
shift(column, shift, group, cyclic) | Returns the attribute column shifted by the shift value. The shift can be performed according to the groups defined in the group parameter. |
Parameters in bold are mandatory.
Logical functions in the Data Manager
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
ifelse(condition, iftrue, iffalse) | Checks whether a specific condition is true, and returns the value of iftrue if the condition is true, otherwise it returns the value of iffalse if the condition is false. | |
isDate(string, binary) | Checks whether the data type of the attribute specified in the string parameter is date. | |
isDatetime(string, binary) | Checks whether the data type of the attribute specified in the string parameter is datetime. | |
isFloat(string, binary) | Checks whether the data type of the attribute specified in the string parameter is continuous. | |
isInteger(string, binary) | Checks whether the data type of the attribute specified in the string parameter is integer. | |
isMonth(string, binary) | Checks whether the data type of the attribute specified in the string parameter is month. | |
isQuarter(string, binary) | Checks whether the data type of the attribute specified in the string parameter is quarter. | |
isTime(string, binary) | Checks whether the data type of the attribute specified in the string parameter is time. | |
isType(string, type, binary) | Checks whether the data type of the attribute specified in the string parameter corresponds to the data type specified in the type parameter. | |
isWeek(string, binary) | Checks whether the data type of the attribute specified in the string parameter is week. |
Parameters in bold are mandatory.
Statistical functions in the Data Manager
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
anovap(column, attclass, group, usemissing) | Returns the ANOVA p value, which is the probability to obtain a worst case compared to the null hypothesis we are verifying. Values above 0.05 (i.e. the conventional value for alpha) state that we can’t reject the null hypothesis, while values below 0.05 state that we need to reject the null hypothesis and consider the alternative one. | |
anovat(column, attclass, group, usemissing) | Returns the ANOVA test value. | |
argMax(column, group) | Returns the ID of the row that contains the maximum value of a selected attribute. | |
argMin(column, group) | Returns the ID of the row that contains the minimum value of a selected attribute. | |
chisquare(column1, column2, group, usemissing) | Returns the level of correlation between two nominal variables. The higher the value, the stronger the correlation between the selected nominal attributes. | |
chisquarep(column1, column2, group, usemissing) | Compares the null hypothesis, which assumes the variables are totally independent, to the results obtained by analysing the data (the alternative hypothesis), to evaluate the reliability of the correlation. The result is the p-value coefficient, which indicates the lowest level of significance at which the null hypothesis of the coefficient would be rejected. Values can range between 0 and 1, where low values below 0.05 (alpha) indicate that there may effectively be a correlation between the variables, whereas higher values indicate that the results are probably due simply to chance, and cannot consequently be considered reliable. This value is particularly important to consider when the dataset has a limited number of samples. | |
cohenk(column1, column2, group, usemissing) | Applies the Cohen’s kappa coefficient to compare values. It is commonly used to compare real and predicted values to evaluate model performance, considering the probability of agreement by pure chance. | |
count(group) | Returns either the number of overall values present in an attribute, or the number of times each distinct value is present. | |
countIf(condition, group) | Returns the number of times a distinct value, which meets a specified condition, is present in an attribute. | |
covariance(column1, column2, group) | Measures how changes in one variable are associated with changes in a second variable. | |
cumMax(column, group) | Returns the cumulative maximum of the column, which is the greatest value between the current value of the column and the previous values of the same column, evaluated within groups defined by the group parameter if required. | |
cumMin(column, group) | Returns the cumulative minimum of the column, which is the lowest value between the current value of the column and the previous values of the same column, evaluated within groups defined by the group parameter if required. | |
distinct(column, group) | Returns the number of distinct values of the column, evaluated within groups defined in the group parameter, if required. | |
entropy(column, group, usemissing) | Returns the entropy of the column. | |
fact(column) | Returns the factorial of the values of the column. | |
gini(column, group, usemissing) | Returns the Gini index of the column, evaluated within groups defined by the group parameter, if required. | |
inIqr(column, coeff) | Isolates outliers: for each data observation, it identifies whether it is in the interquartile deviation or not. It returns the column with a binary True/False value according to the interquartile range. | |
max(column, group) | Returns the maximum of the column. | |
max2(column1, column2) | Returns the maximum value between two columns. | |
maxyoudencut(column, attclass, defclass, group) | Returns the value which maximizes the youden index of the ROC curve defined by column1 and by the class attclass. The computation can be performed according to the groups defined in the group parameter, if required. | |
mean(column, group) | Returns the mean of the column. | |
median(column, group) | Returns the median of the column. | |
min(column, group) | Returns the minimum of the column. | |
min2(column1, column2) | Returns the minimum value between two columns. | |
mode(column, group, usemissing) | Returns the mode of the column. | |
movMean(column, lag, group, front) | Returns the moving average of the column, evaluated on the lag continuous rows, computed according to groups defined by the group parameter if required. | |
pearson(column1, column2, group) | Returns the Pearson coefficient between two columns, evaluated within groups defined by the group parameter if required. | |
quantile(column, quant, group, weights) | Returns the specified quantile of the column, evaluated within groups defined by the group parameter if required. A column of weights can also be defined. Quantiles are cut points dividing a range of probability distribution into intervals with equal probabilities. | |
roc(column, attclass, defclass, group) | Returns the correlation between a continuous variable and a binary target variable. It calculates a performance indicator, the AUC, that is the area under the curve defined by the column and the attclass. The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. All the computation can be performed according to the groups defined in the group parameter. | |
std(column, group) | Returns the standard deviation of the column, evaluated within groups defined by the group parameter, if required. The standard deviation is the square root of the variance. | |
variance(column, group) | Returns the variance of the column, evaluated within groups defined by the group parameter, if required. The variance is a measure of dispersion, which displays how a set of values is far from their average value. |
Parameters in bold are mandatory.
Math and Trigonometry functions in the Data Manager
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
abs(column) | Returns the absolute value of each value of the column. | |
acos(column) | Returns the arccosine values of each value of the column. | |
acosh(column) | Returns the hyperbolic arccosine of each value of the column. | |
asin(column) | Returns the arcsine of each value of the column. | |
asinh(column) | Returns the hyperbolic arcsine of each value of the column. | |
atan(column) | Returns the arctangent of each value of the column. | |
atanh(column) | Returns the hyperbolic arctangent of each value of the column. | |
baseConv(column, basein, baseout, compflagin, compflagout) | Converts a base 10 integer, or a string that corresponds to an integer, to a different base. Optional parameters allow the user to have a 2-complement code (if set to True) in the input and/or in the output value. | |
ceil(column) | Returns each value of the column rounded up to its next bigger integer unit. | |
cos(column) | Returns the cosine of each row of the column. | |
cosh(column) | Returns the hyperbolic cosine of each row of the column. | |
cumProd(column, group) | Returns the cumulative product of the column, evaluated within groups defined by the group parameter if required. | |
cumSum(column, group) | Returns the cumulative sum of the column, evaluated within groups defined by the group parameter if required. | |
exp(column) | Returns the exponential of each row of the column. | |
floor(column) | Returns each value of the column rounded down to its next smaller integer unit. | |
isInteger(string, binary) | Checks whether a string corresponds to an integer value. | |
log(column) | Returns the natural logarithm of each row of the column. | |
log10(column) | Returns the logarithm (with respect to 10) of each row of the column. | |
prod(column, group) | Returns the product of the column, evaluated within groups defined by the group parameter if required. | |
rand(n, seed) | Returns a random column with the specified number of elements. If the number of elements is specified, a random column is created with n (n=number of examples) elements. | |
randGauss(n, seed, mean, stddev) | Returns a normally distributed random column with the specified number of elements. If the number of elements is specified, a random column with n (=number of examples) elements is created. | |
round(column) | Returns the nearest integer value of each row of the column. | |
sign(column) | Returns the sign of each row of the column. | |
sin(column) | Returns the sine of each row of the column. | |
sinh(column) | Returns the hyperbolic sine of each row of the column. | |
sqrt(column) | Returns the square root of each row of the column. | |
sum(column, group) | Returns the sum of the column, evaluated within groups defined by the group parameter if required. | |
tan(column) | Returns the tangent of each row of the column. | |
tanh(column) | Returns the hyperbolic tangent of each row of the column. |
Parameters in bold are mandatory.
Text functions in the Data Manager
If you are using any of these functions with continuous attributes, please double check the Flow Execution Parameters, as you can set the number of digits to be displayed after the function’s computation.
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
charReplace(column, oldchar, newchar, unchanged, charforothers, considersequence) | Replaces the current fonts with new ones. | |
distance(column1, column2, method) | Computes the distance between the values of two columns, column1, column2, according to one of the following methods: "levenshtein" ("I"), "damerau-levenshtein" ("dl"), "lcs", "hamming". | |
find(column, value, binary, ischarlist, charpos) | Looks for a value in a column and returns True or False. | |
head(column, nchar) | Returns in each row of the result with the first n letters of the corresponding value contained in the column. | |
isPrefix(column, value, binary) | Checks whether a string is a prefix or not. | |
isSuffix(column, value, binary) | Checks whether a string is a suffix or not. | |
isWord(column, substring, delimiter, binary) | Checks whether a string (which can have a delimiter) is contained in an attribute or not. | |
numExt(column, onlyint, separator) | Returns a string containing only the numerical fonts of the input string. If more than one number is present, numbers are delimited by a separator decided by the user (by default "-"). | |
pad(column, len, value, where) | Returns in each row of the result, the values of the column, filled (padded) with the padstring value to reach the specified length. The string can be added at the beginning (where = "begin" or by default) or at the end (where = "end") of the string, according to the value of the parameter where. | |
phonetic(column, component) | Returns the phonetic encoding of the strings contained in the column using the Metaphone algorithm. Phonetic may return the primary Metaphone component (component = "Primary" or component = "P") or the secondary component (component = "Secondary" or component = "S"). By default the primary component is returned. | |
prefix(column, value, last) | Considers the chosen value as prefix and returns the subsequent fonts. | |
replace(column, oldvalue, newvalue, ntimes) | Replaces the current strings of the values in the column with the new ones. | |
strip(column, value, where, ischarlist) | Returns the value without the specified characters or list of characters located at the beginning, at the end or on both sides of the value. | |
suffix(column, value, last) | Considers the chosen value as suffix and returns the preceding characters. | |
tail(column, nchar) | Returns the last n letters of the corresponding value in the column. | |
textConcat(column, separator, group) | Returns the concatenation of all the strings in a column. | |
textExtract(column, startpos, endpos) | Returns the string ranging from a defined starting position to defined ending position. | |
textFormat(column) | Returns the type of the strings in each row of the column. | |
textLen(column) | Returns the length of the string contained in each row of the column. | |
textLower(column, mode, leaveother) | Changes uppercase fonts of a nominal attribute to lowercase fonts. | |
textSort(column, ascending) | Sorts in ascending order the strings contained in each cell of a nominal attribute. | |
textUpper(column, mode, leaveother) | Changes lowercase fonts of a nominal attribute to uppercase fonts. |
Parameters in bold are mandatory.
Date and Time functions in the Data Manager
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
addMonth(date, nmonth) | Adds a given number of months to a date attribute. | |
addQuarter(date, nquarter) | Adds a given number of quarters to a date attribute. | |
addWorkingDays(date, nday) | Adds a given number of working days (excluding weekends) to a date attribute. | |
addYear(date, nyear) | Adds a given number of years to a date attribute. | |
currDate(utc) | Returns the current date according to local or UTC settings. | |
currDatetime(utc) | Returns the current datetime according to local or UTC settings. | |
date(year, month, day) | Returns a column with all values equal to the date consisting of given year, month and day. This description is valid only when the formula is filled with the year mandatory parameter and at least one optional parameter. | |
datetime(date, time) | Returns in each row of the result the datetime value obtained by the composition of the date value contained in the date entry and the time value contained in the time entry. | |
day(date) | Returns the day value of the date. | |
hour(time) | Returns the hour value of time. | |
isDate(string, binary) | Checks whether the data type of the attribute specified in the string parameter is date. | |
isDatetime(string, binary) | Checks whether the data type of the attribute specified in the string parameter is date. | |
isMonth(string, binary) | Checks whether the data type of the attribute specified in the string parameter is month. | |
isQuarter(string, binary) | Checks whether the data type of the attribute specified in the string parameter is quarter. | |
isTime(string, binary) | Checks whether the data type of the attribute specified in the string parameter is time. | |
isWeek(string, binary) | Checks whether the data type of the attribute specified in the string parameter is week. | |
minute(time) | Returns the minute value of time. | |
month(date, extract) | Returns the month value of date. | |
quarter(date, extract) | Returns the quarter value of date. If extract is True, it tries to cast variable date to quarter. | |
second(time) | Returns the seconds (values) of the time values. | |
time(hour, minute, second) | Composes a time starting from hours, minutes and seconds. | |
timeZone() | Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time. | |
week(date, extract) | Returns the week integer value of an attribute containing a value which can reconduct to a date. | |
weekDay(date, mondaystart) | Returns the day of the week as an integer for each value of date. | |
year(date) | Returns the year value of an attribute containing a date. |
Parameters in bold are mandatory.
Graphs functions in the Data Manager
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
connComp(parent, son, group) | This function operates on directed graphs. For each note, it returns the corresponding subgroup. The node is the parent or son value of the graph. In other words, the connComp function defines which subgroups of nodes are interconnected. | |
leaf(parent, son, group, whichpath, separator, weights, operator) | This function operates on a directed graph, the leaf of a node is meant as the very last node of a branch. | |
leafDistance(parent, son, group, whichpath, separator, weights, operator) | The leafDistance function calculates the distance, in terms of number of edges, of each node of the son attribute from its leaf. A leaf is the very last node of the branch. An edge is the link from one node to another. | |
root(parent, son, group, whichpath, separator, weights, operator) | The root function returns the corresponding root of a node. The root is the very first node of the branch. | |
rootDistance(parent, son, group, whichpath, separator, weights, operator) | The rootDistance function operates on a directed graph, two major components in a directed graph are nodes and edges. |
Parameters in bold are mandatory.
System functions in the Data Manager
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
currDate(utc) | Returns the current date according to local or UTC settings. | |
currDatetime(utc) | Returns the current datetime according to local or UTC settings. | |
hostName() | Returns the hostname of the machine where Rulex Platform is running. | |
ipAddress() | Returns the IP address of the machine where Rulex Platform is running. | |
timeZone() | Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time. |
Parameters in bold are mandatory.
Data functions in the Data Manager
Rulex Platform formula | ||
---|---|---|
Function | Formula | Description |
cast(column, newtype, forced) | Casts a column to the specified data type. If the flag forced is set to false (true by default) only those operations which do not result in a loss of precision will be performed. | |
catNames(indatt, values, separator, negate) | Searches for values in specific attributes, and returns the headers of the attributes where the values were found. All the corresponding headers are concatenated. | |
decideType(column) | Assigns the correct data type to an attribute, depending on the values it contains. | |
disc(column, cutoffs, rank) | Discretizes values of a selected attribute according to cutoff values. | |
discEqualFrequency(column, nvalue, rank, quantile) | Discretizes values of a selected attribute into bins with the same number of values in each. | |
discEqualWidth(column, nvalue, rank, min, max) | Discretizes values of a selected attribute into bins of equal width. | |
discretize(column, nvalue, cutoffs, mode, rank, quantile, min, max) | Discretizes values of a selected attribute into bins of equal width, or with the same number of values or according to cutoff values. | |
isAttribute(name, binary) | Checks whether a specified attribute is present in the dataset. The result can either be returned as a Boolean (true/false) or binary (0/1) result. | |
isFloat(string, binary) | Checks whether the data type of the attribute specified in the string parameter is continuous. | |
isType(string, type, binary) | Checks whether the data type of the attribute specified in the string parameter corresponds to the data type specified in the type parameter. | |
type(column) | Returns the data type of the selected column as a nominal value. |
Parameters in bold are mandatory.
When using functions in formulas, pay close attention to the following syntax:
Function names are case sensitive.
When specifying the attributes, the syntax is
$"attributename"
When specifying a string, the syntax is
'string'
When specifying continuous values and integers, just type them.