disc function
The disc function discretizes data values into ranges defined by cutoff values.
Parameters
disc(column, cutoffs, rank)
Parameter | Description |
---|---|
column | The attribute whose values we want to discretize. The column parameter is mandatory. |
cutoffs | The cutoff values that will be used to discretize values into ranges. All cutoff points must be enclosed in square brackets. The cutoffs parameter is mandatory. |
rank | By default, the central value of each range is displayed. If instead we want to display a ranking number for each range, the rank value must be set to True. It is False by default. |
Example
The following example uses the Age_BMI dataset. This dataset has been extracted from the public Hepatitis C Virus (HCV) for Egyptian patients dataset available on Kaggle.
Description | Screenshot |
---|---|
In the Age_BMI dataset, we have added a new attribute, called Disc_BMI, to the dataset to contain the discretized values of the BMI attribute. In this new attribute we have divided the BMI values into 4 different ranges, using the cutoff values 18, 25 and 31, with the formula The resulting 4 groups display the central value for each range. | |
If we want to display a ranking number for each range, we simply set the rank parameter to True, instead of leaving its default value. The formula will consequently be: |