The distance function computes the distance between the values of two columns, column1, column2, according to one of the following methods: "levenshtein" ("I"), "damerau-levenshtein" ("dl"), "lcs", "hamming".
distance(column1, column2, method)
If you are using continuous attributes, check the Flow Execution Parameters.
The first attribute used to evaluate the distance. If it is not nominal, it will be casted to nominal upon function’s computation. The column1 parameter is mandatory.
The second attribute used to evaluate the distance. If it is not nominal, it will be casted to nominal upon function’s computation. The column2 parameter is mandatory.
The algorithm ("levenshtein", "damerau-levenshtein", "lcs", "hamming") used to evaluate the string distance. Each method is associated to a string: ‘l' for the Levenshtein algorithm, ‘dl’ for the Damerau-Levenshtein algorithm, ‘lcs’ for the lcs algorithm, 'hamming’ for the hamming algorithm. The method parameter is mandatory.
The following example uses the Turkish calendar dataset.
In this example, we want to calculate the distance between the RAMADAN_FLAG attribute and the PUBLIC_HOLIDAY_FLAG attributes using the Levenshtein algorithm.
Add a new attribute, called distance, and type the following formula:
The function has returned 0, when the value in both attributes is the same, so no changes has to be made to make them equal. It has returned 1 when the values were not equal, indicating the operations to perform to make the values equal.