Formulas and Functions in the Factory

Formulas and functions allow you to enrich your dataset by creating new attributes, and retrieve information on your attributes.

Function categories

Arrays

Arrays in the Data Manager

Rulex Platform formula
Function	Formula	Description
enum	enum(group)	Enumerates the rows in a dataset, in ascending order, optionally grouped by a selected attribute.
fillDown	fillDown(column, group, fillall)	Returns a copy of the specified column, filling all the missing values with the last valid value, optionally grouped by selected attributes. If there are rows which remain empty, as they do not have a previous valid value, a fillUp operation can then be performed, using the next valid value.
fillLinear	fillLinear(column, group)	Fills any missing values for a specified attribute with a value, based on the other values present in the attribute, using a linear interpolation method. The linear interpolation links two adjacent values with a straight line, graphically: this means that the gap is filled with values with equal distance one to the other.
fillUp	fillLinear(column, group)	Returns a copy of the selected column, filling all the missing values with the next valid value, optionally grouped by selected attributes. If there are rows which remain empty, as they do not have a next valid value, a fillDown operation can then be performed, using the previous valid value.
integer	integer(column)	Returns the values of the specified attribute, cast to the integer data type.
len	len(column)	Returns the number of values present in the attribute, including missing values.
matchHeaders	matchHeaders(column)	Returns the value in the corresponding row for the selected column.
nominal	nominal(column)	Returns the values of the specified attribute cast to the nominal data type.
perm	perm(column)	Returns the values of a selected attribute in a random order.
rank	rank(column, usemissing)	Returns the values of a column in ascending alphabetic order. By default missing values are considered in the ranking.
shift	shift(column, shift, group, cyclic)	Returns the attribute column shifted by the shift value. The shift can be performed according to the groups defined in the group parameter.

Parameters in bold are mandatory.

Logical

Logical functions in the Data Manager

Rulex Platform formula
Function	Formula	Description
ifelse	ifelse(condition, iftrue, iffalse)	Checks whether a specific condition is true, and returns the value of iftrue if the condition is true, otherwise it returns the value of iffalse if the condition is false.
ifNone	ifNone(column, condition, iftrue)	The ifNone function fills the attribute’s None values, which satisfy the specified condition, with the desired value. If the condition is not satisfied, the cell is left empty.
isDate	isDate(string, binary)	Checks whether the data type of the attribute specified in the string parameter is date.
isDatetime	isDatetime(string, binary)	Checks whether the data type of the attribute specified in the string parameter is datetime.
isFloat	isFloat(string, binary)	Checks whether the data type of the attribute specified in the string parameter is continuous.
isInteger	isInteger(string, binary)	Checks whether the data type of the attribute specified in the string parameter is integer.
isMonth	isMonth(string, binary)	Checks whether the data type of the attribute specified in the string parameter is month.
isQuarter	isQuarter(string, binary)	Checks whether the data type of the attribute specified in the string parameter is quarter.
isTime	isTime(string, binary)	Checks whether the data type of the attribute specified in the string parameter is time.
isType	isType(string, type, binary)	Checks whether the data type of the attribute specified in the string parameter corresponds to the data type specified in the type parameter.
isWeek	isWeek(string, binary)	Checks whether the data type of the attribute specified in the string parameter is week.

Parameters in bold are mandatory.

Statistical

Statistical functions in the Data Manager

Rulex Platform formula
Function	Formula	Description
anovap	anovap(column, attclass, group, usemissing)	Returns the ANOVA p value, which is the probability to obtain a worst case compared to the null hypothesis we are verifying. Values above 0.05 (i.e. the conventional value for alpha) state that we can’t reject the null hypothesis, while values below 0.05 state that we need to reject the null hypothesis and consider the alternative one.
anovat	anovat(column, attclass, group, usemissing)	Returns the ANOVA test value.
argMax	argMax(column, group)	Returns the ID of the row that contains the maximum value of a selected attribute.
argMin	argMin(column, group)	Returns the ID of the row that contains the minimum value of a selected attribute.
chisquare	chisquare(column1, column2, group, usemissing)	Returns the level of correlation between two nominal variables. The higher the value, the stronger the correlation between the selected nominal attributes.
chisquarep	chisquarep(column1, column2, group, usemissing)	Compares the null hypothesis, which assumes the variables are totally independent, to the results obtained by analysing the data (the alternative hypothesis), to evaluate the reliability of the correlation. The result is the p-value coefficient, which indicates the lowest level of significance at which the null hypothesis of the coefficient would be rejected. Values can range between 0 and 1, where low values below 0.05 (alpha) indicate that there may effectively be a correlation between the variables, whereas higher values indicate that the results are probably due simply to chance, and cannot consequently be considered reliable. This value is particularly important to consider when the dataset has a limited number of samples.
cohenk	cohenk(column1, column2, group, usemissing)	Applies the Cohen’s kappa coefficient to compare values. It is commonly used to compare real and predicted values to evaluate model performance, considering the probability of agreement by pure chance.
count	count(group)	Returns either the number of overall values present in an attribute, or the number of times each distinct value is present.
countIf	countIf(condition, group)	Returns the number of times a distinct value, which meets a specified condition, is present in an attribute.
covariance	covariance(column1, column2, group)	Measures how changes in one variable are associated with changes in a second variable.
cumMax	cumMax(column, group)	Returns the cumulative maximum of the column, which is the greatest value between the current value of the column and the previous values of the same column, evaluated within groups defined by the group parameter if required.
cumMin	cumMin(column, group)	Returns the cumulative minimum of the column, which is the lowest value between the current value of the column and the previous values of the same column, evaluated within groups defined by the group parameter if required.
distinct	distinct(column, group)	Returns the number of distinct values of the column, evaluated within groups defined in the group parameter, if required.
entropy	entropy(column, group, usemissing)	Returns the entropy of the column.
fact	fact(column)	Returns the factorial of the values of the column.
gini	gini(column, group, usemissing)	Returns the Gini index of the column, evaluated within groups defined by the group parameter, if required.
inIqr	inIqr(column, coeff)	Isolates outliers: for each data observation, it identifies whether it is in the interquartile deviation or not. It returns the column with a binary True/False value according to the interquartile range.
max	max(column, group)	Returns the maximum of the column.
max2	max2(column1, column2)	Returns the maximum value between two columns.
maxyoudencut	maxyoudencut(column, attclass, defclass, group)	Returns the value which maximizes the youden index of the ROC curve defined by column1 and by the class attclass. The computation can be performed according to the groups defined in the group parameter, if required.
mean	mean(column, group)	Returns the mean of the column.
median	median(column, group)	Returns the median of the column.
min	min(column, group)	Returns the minimum of the column.
min2	min2(column1, column2)	Returns the minimum value between two columns.
mode	mode(column, group, usemissing)	Returns the mode of the column.
movMean	movMean(column, lag, group, front)	Returns the moving average of the column, evaluated on the lag continuous rows, computed according to groups defined by the group parameter if required.
pearson	pearson(column1, column2, group)	Returns the Pearson coefficient between two columns, evaluated within groups defined by the group parameter if required.
quantile	quantile(column, quant, group, weights)	Returns the specified quantile of the column, evaluated within groups defined by the group parameter if required. A column of weights can also be defined. Quantiles are cut points dividing a range of probability distribution into intervals with equal probabilities.
roc	roc(column, attclass, defclass, group)	Returns the correlation between a continuous variable and a binary target variable. It calculates a performance indicator, the AUC, that is the area under the curve defined by the column and the attclass. The default value for the class attribute (if more than two values are present) can be specified as the optional parameter defclass. All the computation can be performed according to the groups defined in the group parameter.
std	std(column, group)	Returns the standard deviation of the column, evaluated within groups defined by the group parameter, if required. The standard deviation is the square root of the variance.
variance	variance(column, group)	Returns the variance of the column, evaluated within groups defined by the group parameter, if required. The variance is a measure of dispersion, which displays how a set of values is far from their average value.

Parameters in bold are mandatory.

Math and Trigonometry

Math and Trigonometry functions in the Data Manager

Rulex Platform formula
Function	Formula	Description
abs	abs(column)	Returns the absolute value of each value of the column.
acos	acos(column)	Returns the arccosine values of each value of the column.
acosh	acosh(column)	Returns the hyperbolic arccosine of each value of the column.
asin	asin(column)	Returns the arcsine of each value of the column.
asinh	asinh(column)	Returns the hyperbolic arcsine of each value of the column.
atan	atan(column)	Returns the arctangent of each value of the column.
atanh	atanh(column)	Returns the hyperbolic arctangent of each value of the column.
baseConv	baseConv(column, basein, baseout, compflagin, compflagout)	Converts a base 10 integer, or a string that corresponds to an integer, to a different base. Optional parameters allow the user to have a 2-complement code (if set to True) in the input and/or in the output value.
ceil	ceil(column)	Returns each value of the column rounded up to its next bigger integer unit.
cos	cos(column)	Returns the cosine of each row of the column.
cosh	cosh(column)	Returns the hyperbolic cosine of each row of the column.
cumProd	cumProd(column, group)	Returns the cumulative product of the column, evaluated within groups defined by the group parameter if required.
cumSum	cumSum(column, group)	Returns the cumulative sum of the column, evaluated within groups defined by the group parameter if required.
exp	exp(column)	Returns the exponential of each row of the column.
floor	floor(column)	Returns each value of the column rounded down to its next smaller integer unit.
isInteger	isInteger(string, binary)	Checks whether a string corresponds to an integer value.
log	log(column)	Returns the natural logarithm of each row of the column.
log10	log10(column)	Returns the logarithm (with respect to 10) of each row of the column.
prod	prod(column, group)	Returns the product of the column, evaluated within groups defined by the group parameter if required.
rand	rand(n, seed)	Returns a random column with the specified number of elements. If the number of elements is specified, a random column is created with n (n=number of examples) elements.
randGauss	randGauss(n, seed, mean, stddev)	Returns a normally distributed random column with the specified number of elements. If the number of elements is specified, a random column with n (=number of examples) elements is created.
round	round(column)	Returns the nearest integer value of each row of the column.
sign	sign(column)	Returns the sign of each row of the column.
sin	sin(column)	Returns the sine of each row of the column.
sinh	sinh(column)	Returns the hyperbolic sine of each row of the column.
sqrt	sqrt(column)	Returns the square root of each row of the column.
sum	sum(column, group)	Returns the sum of the column, evaluated within groups defined by the group parameter if required.
tan	tan(column)	Returns the tangent of each row of the column.
tanh	tanh(column)	Returns the hyperbolic tangent of each row of the column.

Parameters in bold are mandatory.

Text

Text functions in the Data Manager

If you are using any of these functions with continuous attributes, please double check the Flow Execution Parameters, as you can set the number of digits to be displayed after the function’s computation.

Rulex Platform formula
Function	Formula	Description
charReplace	charReplace(column, oldchar, newchar, unchanged, charforothers, considersequence)	Replaces the current fonts with new ones.
distance	distance(column1, column2, method)	Computes the distance between the values of two columns, column1, column2, according to one of the following methods: "levenshtein" ("I"), "damerau-levenshtein" ("dl"), "lcs", "hamming".
find	find(column, value, binary, ischarlist, charpos)	Looks for a value in a column and returns True or False.
head	head(column, nchar)	Returns in each row of the result with the first n letters of the corresponding value contained in the column.
isPrefix	isPrefix(column, value, binary)	Checks whether a string is a prefix or not.
isSuffix	isSuffix(column, value, binary)	Checks whether a string is a suffix or not.
isWord	isWord(column, substring, delimiter, binary)	Checks whether a string (which can have a delimiter) is contained in an attribute or not.
numExt	numExt(column, onlyint, separator)	Returns a string containing only the numerical fonts of the input string. If more than one number is present, numbers are delimited by a separator decided by the user (by default "-").
pad	pad(column, len, value, where)	Returns in each row of the result, the values of the column, filled (padded) with the padstring value to reach the specified length. The string can be added at the beginning (where = "begin" or by default) or at the end (where = "end") of the string, according to the value of the parameter where.
phonetic	phonetic(column, component)	Returns the phonetic encoding of the strings contained in the column using the Metaphone algorithm. Phonetic may return the primary Metaphone component (component = "Primary" or component = "P") or the secondary component (component = "Secondary" or component = "S"). By default the primary component is returned.
prefix	prefix(column, value, last)	Considers the chosen value as prefix and returns the subsequent fonts.
replace	replace(column, oldvalue, newvalue, ntimes)	Replaces the current strings of the values in the column with the new ones.
strip	strip(column, value, where, ischarlist)	Returns the value without the specified characters or list of characters located at the beginning, at the end or on both sides of the value.
suffix	suffix(column, value, last)	Considers the chosen value as suffix and returns the preceding characters.
tail	tail(column, nchar)	Returns the last n letters of the corresponding value in the column.
textConcat	textConcat(column, separator, group)	Returns the concatenation of all the strings in a column.
textExtract	textExtract(column, startpos, endpos)	Returns the string ranging from a defined starting position to defined ending position.
textFormat	textFormat(column)	Returns the type of the strings in each row of the column.
textLen	textLen(column)	Returns the length of the string contained in each row of the column.
textLower	textLower(column, mode, leaveother)	Changes uppercase fonts of a nominal attribute to lowercase fonts.
textSort	textSort(column, ascending)	Sorts in ascending order the strings contained in each cell of a nominal attribute.
textUpper	textUpper(column, mode, leaveother)	Changes lowercase fonts of a nominal attribute to uppercase fonts.

Parameters in bold are mandatory.

Date/Time

Date and Time functions in the Data Manager

Rulex Platform formula
Function	Formula	Description
addMonth	addMonth(date, nmonth)	Adds a given number of months to a date attribute.
addQuarter	addQuarter(date, nquarter)	Adds a given number of quarters to a date attribute.
addWorkingDays	addWorkingDays(date, nday)	Adds a given number of working days (excluding weekends) to a date attribute.
addYear	addYear(date, nyear)	Adds a given number of years to a date attribute.
currDate	currDate(utc)	Returns the current date according to local or UTC settings.
currDatetime	currDatetime(utc)	Returns the current datetime according to local or UTC settings.
date	date(year, month, day)	Returns a column with all values equal to the date consisting of given year, month and day. This description is valid only when the formula is filled with the year mandatory parameter and at least one optional parameter.
datetime	datetime(date, time)	Returns in each row of the result the datetime value obtained by the composition of the date value contained in the date entry and the time value contained in the time entry.
day	day(date)	Returns the day value of the date.
hour	hour(time)	Returns the hour value of time.
isDate	isDate(string, binary)	Checks whether the data type of the attribute specified in the string parameter is date.
isDatetime	isDatetime(string, binary)	Checks whether the data type of the attribute specified in the string parameter is date.
isMonth	isMonth(string, binary)	Checks whether the data type of the attribute specified in the string parameter is month.
isQuarter	isQuarter(string, binary)	Checks whether the data type of the attribute specified in the string parameter is quarter.
isTime	isTime(string, binary)	Checks whether the data type of the attribute specified in the string parameter is time.
isWeek	isWeek(string, binary)	Checks whether the data type of the attribute specified in the string parameter is week.
minute	minute(time)	Returns the minute value of time.
month	month(date, extract)	Returns the month value of date.
quarter	quarter(date, extract)	Returns the quarter value of date. If extract is True, it tries to cast variable date to quarter.
second	second(time)	Returns the seconds (values) of the time values.
time	time(hour, minute, second)	Composes a time starting from hours, minutes and seconds.
timeZone	timeZone()	Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time.
week	week(date, extract)	Returns the week integer value of an attribute containing a value which can reconduct to a date.
weekDay	weekDay(date, mondaystart)	Returns the day of the week as an integer for each value of date.
year	year(date)	Returns the year value of an attribute containing a date.

Parameters in bold are mandatory.

Graphs

Graphs functions in the Data Manager

Rulex Platform formula
Function	Formula	Description
connComp	connComp(parent, son, group)	This function operates on directed graphs. For each note, it returns the corresponding subgroup. The node is the parent or son value of the graph. In other words, the connComp function defines which subgroups of nodes are interconnected.
leaf	leaf(parent, son, group, whichpath, separator, weights, operator)	This function operates on a directed graph, the leaf of a node is meant as the very last node of a branch. The leaf function retrieves the corresponding leaf for each node of the son attribute.
leafDistance	leafDistance(parent, son, group, whichpath, separator, weights, operator)	The leafDistance function calculates the distance, in terms of number of edges, of each node of the son attribute from its leaf. A leaf is the very last node of the branch. An edge is the link from one node to another.
root	root(parent, son, group, whichpath, separator, weights, operator)	The root function returns the corresponding root of a node. The root is the very first node of the branch.
rootDistance	rootDistance(parent, son, group, whichpath, separator, weights, operator)	The rootDistance function operates on a directed graph, two major components in a directed graph are nodes and edges. A node is a data element while an edge is a link that helps to connect nodes. The rootDistance function calculates the distance of each node from the root.

Parameters in bold are mandatory.

System

System functions in the Data Manager

Rulex Platform formula
Function	Formula	Description
currDate	currDate(utc)	Returns the current date according to local or UTC settings.
currDatetime	currDatetime(utc)	Returns the current datetime according to local or UTC settings.
hostName	hostName()	Returns the hostname of the machine where Rulex Platform is running.
ipAddress	ipAddress()	Returns the IP address of the machine where Rulex Platform is running.
timeZone	timeZone()	Returns the current timezone, i.e. the difference between local time and UTC time. The resulting type is time.

Parameters in bold are mandatory.

Data

Data functions in the Data Manager

Rulex Platform formula
Function	Formula	Description
cast	cast(column, newtype, forced)	Casts a column to the specified data type. If the flag forced is set to false (true by default) only those operations which do not result in a loss of precision will be performed.
catNames	catNames(indatt, values, separator, negate)	Searches for values in specific attributes, and returns the headers of the attributes where the values were found. All the corresponding headers are concatenated.
decideType	decideType(column)	Assigns the correct data type to an attribute, depending on the values it contains.
disc	disc(column, cutoffs, rank)	Discretizes values of a selected attribute according to cutoff values.
discEqualFrequency	discEqualFrequency(column, nvalue, rank, quantile)	Discretizes values of a selected attribute into bins with the same number of values in each.
discEqualWidth	discEqualWidth(column, nvalue, rank, min, max)	Discretizes values of a selected attribute into bins of equal width.
discretize	discretize(column, nvalue, cutoffs, mode, rank, quantile, min, max)	Discretizes values of a selected attribute into bins of equal width, or with the same number of values or according to cutoff values.
isAttribute	isAttribute(name, binary)	Checks whether a specified attribute is present in the dataset. The result can either be returned as a Boolean (true/false) or binary (0/1) result.
isFloat	isFloat(string, binary)	Checks whether the data type of the attribute specified in the string parameter is continuous.
isType	isType(string, type, binary)	Checks whether the data type of the attribute specified in the string parameter corresponds to the data type specified in the type parameter.
type	type(column)	Returns the data type of the selected column as a nominal value.

Parameters in bold are mandatory.

When using functions in formulas, pay close attention to the following syntax:

Function names are case sensitive.
When specifying the attributes, the syntax is $"attributename"
When specifying a string, the syntax is 'string'
When specifying continuous values and integers, just type them.

Search