The phonetic function returns the phonetic encoding of the strings contained in the column using the Metaphone algorithm. It may return the primary Metaphone component (component = "Primary" or component = "P") or the secondary component (component = "Secondary" or component = "S"). By default the primary component is returned.
The nominal attribute used to evaluate the phonetic encoding. If it is not nominal, it will be casted to nominal upon function’s computation. The column parameter is mandatory.
If Primary, P or not specified, the function returns the primary component. If Secondary or S, the secondary component is returned.
The Metaphone algorithm is a phonetic algorithm for indexing words by their English pronunciation. Its features are the as follows:
Drop duplicate adjacent letters, except for C.
If the word begins with 'KN', 'GN', 'PN', 'AE', 'WR', drop the first letter.
Drop 'B' if after 'M' at the end of the word.
'C' transforms to 'X' if followed by 'IA' or 'H' (unless in latter case, it is part of '-SCH-', in which case it transforms to 'K'). 'C' transforms to 'S' if followed by 'I', 'E', or 'Y'. Otherwise, 'C' transforms to 'K'.
'D' transforms to 'J' if followed by 'GE', 'GY', or 'GI'. Otherwise, 'D' transforms to 'T'.
Drop 'G' if followed by 'H' and 'H' is not at the end or before a vowel. Drop 'G' if followed by 'N' or 'NED' and is at the end.
'G' transforms to 'J' if before 'I', 'E', or 'Y', and it is not in 'GG'. Otherwise, 'G' transforms to 'K'.
Drop 'H' if after vowel and not before a vowel.
'CK' transforms to 'K'.
'PH' transforms to 'F'.
'Q' transforms to 'K'.
'S' transforms to 'X' if followed by 'H', 'IO', or 'IA'.
'T' transforms to 'X' if followed by 'IA' or 'IO'. 'TH' transforms to '0'. Drop 'T' if followed by 'CH'.
'V' transforms to 'F'.
'WH' transforms to 'W' if at the beginning. Drop 'W' if not followed by a vowel.
'X' transforms to 'S' if at the beginning. Otherwise, 'X' transforms to 'KS'.
Drop 'Y' if not followed by a vowel.
'Z' transforms to 'S'.
Drop all vowels unless it is the beginning.
The following example uses the Students_performance dataset.
In this example, we want to retrieve the pronunciation of the test preparation course attribute.
Add a new attribute, called phonetic, and type the following formula:
phonetic($"test preparation course")
and the attribute is filled with the primary component, as the component parameter hasn’t been specified.