charReplace function in the Factory

The charReplace function replaces fonts with new ones.


Parameters

charReplace(column, oldchar, newchar, unchanged, charforothers, considersequence)

Parameter

Description

column

The nominal attribute from which you extract the substring. The column parameter is mandatory.

oldchar

The string or list of strings (it can be even a word, but every letter of it will be considered as a separate font) which contain all the fonts that have to be replaced with the newchar value. The oldchar parameter is mandatory.

The oldchar parameter is case sensitive.

newchar

The string or list of strings which replace all the fonts in oldchar. The newchar parameter is mandatory.

The newchar parameter is case sensitive.

unchanged

The string which contains all the fonts that must not be replaced. By default all the fonts which are not included in oldchar are included in this list.

charforothers

The string which replaces all the other fonts. The default value is “X”.

considersequence

It is set as False by default. If it is True, every subset of contiguous fonts present in oldchar is replaced by one occurrence of oldchar.


Example

The following example uses the Adult dataset.

Description

Screenshot

In this example, we want to replace the ‘married' string (that are the ‘m', ‘a’, ‘r’, ‘r’, ‘i’, ‘e’, 'd’ fonts) with a '_’: add a new attribute, called replace, and type the following formula:

charReplace($"marital-status",'married','_')

and the ‘m', ‘a’, ‘r’, ‘r’, ‘i’, ‘e’, ‘d’ strings will be replaced by a '_’.

As you can see, all these letters are missing in the new values.

If we want to add more complexity to the function, and we want to edit all the string and also define which fonts must not be changed, we need to specify the optional parameters.

In this example, we want to keep the ‘N' unchanged, and replace all the other fonts which have not been specified in the oldchar parameter with an '*’, which will be the charforothers parameter.

Specify True as the considersequence parameter, so that there will be only one occurrence of that font.

The function will become:

charReplace($"marital-status",'married','_','n','*',True)