Computing Statistics in the Sheets tab
Statistical analysis operations, performed within the Data Manager by working on the Sheets tab, help find useful insight in data.
Both univariate and bivariate statistics can be calculated in Rulex by selecting one or more attributes respectively.
More sheets can be added to the Sheets tab.
These sheets can be used in two modes: blocked mode and edit mode.
In blocked mode, you can only drag attributes onto the sheet panel, and then visualize the different statistics. The sheet, which is called s0 by default, is managed by the Factory, and cells can’t be manually modified.
The first default sheet is in blocked mode, and its status cannot be modified.In edit mode, you can work on your data like on a normal spreadsheet, as you can write on each cell or use a formula to fill its values.
If you need to add a new column and to compute a statistic in it, see the page Statistical Functions in the Data Manager.
Performing query operations after the s0 sheet has been built
If you need to apply functions or to perform query operations on the dataset and you want to update the s0 sheet, you have two options:
you can build a new statistic with the updated attributes on the s0 sheet by dragging the attribute onto the corresponding row, which is highlighted in light green, so you can make comparisons with the previous data situation.
you can update the existing statistic by right-clicking on the attribute in the Var_1 area and selecting Refresh Stat Row.
Procedure - working in blocked mode
In the Data Manager, click on the Sheets tab.
Drag the required attribute onto the Var_1 area.
(Optional) Drag the required attribute onto the Var_2/Target area.
Select the statistic type among the ones described in the table below and configure it by either:
Double-clicking onto the statistic type: the Configuration window opens.
Clicking the pencil button.
Configure the properties of the selected statistics.
Category | Statistic Type | Description | Corresponding page |
---|---|---|---|
Univariate | Single statistics | Single statistics are used to perform preliminary descriptive analyses. | |
Univariate | Values, frequencies and quantiles | Values, frequencies and quantiles are used to obtain specific distribution-related position measures (quantiles) or to tabulate statistics associated to ordinal values. | |
Bivariate | Correlation/Covariance | Correlation/Covariance are used to assess the association between two attributes measured on a continuous scale. | |
Bivariate | Cross tabulation statistics | Cross tabulation statistics analyse the relationship between two categorical attributes by producing a corresponding contingency table. | |
Bivariate | ROC curve | ROC curve compares the distribution of a continuous attribute between two separate groups defined by a binary attribute (or the distribution of two continuous attributes), using standard ROC analysis tools. | |
Bivariate | Test for independent samples | The Test for independent samples section includes the most common statistical tests for the comparison between values of a continuous attribute in two groups, defined by a binary attribute, or the comparison between values of two continuous attributes. | |
Bivariate | Test for paired samples | The Test for paired samples section includes the most common statistical tests for matched samples of a continuous attribute. |
Procedure - working in edit mode
In the Data Manager, click on the Sheets tab.
Click on the plus button to add a new sheet. Its name will be s1 by default.
Edit the sheet’s name, if needed, by clicking on the pencil button next to it.
Work on the sheet just like on a spreadsheet, by editing each cell or typing formulas or cell references.
Save and compute the task.
When referencing specific attributes coming from the Sheets tab, you must follow this syntax:
for specific sheet cells, the syntax must be: #”sheet_name!cell_number”, e.g.
#"s0!A4"
for specific attributes, the syntax must be: $”att_name”, e.g.
$"age"