Computing Test for Independent Samples in the Sheets tab
Tests for independent samples include the most common statistical tests for comparing values:
of a continuous attribute X in two groups, defined by a binary attribute Y, or
of two continuous attributes X, Y.
Category  Properties  Description 

Sample size  Number of valid positive, negative and total samples  The number of valid samples for both attributes n is displayed, for positive, negative and total data samples respectively. This is particularly useful when there is a heavily unbalanced distribution of missing data among the two attributes, which might cause the analysis to be based on an unacceptably small sample size. 
Wilcoxon and MannWhitney test 
 The nonparametric test of MannWhitney and the Wilcoxon equivalent variant are both available. Items include Uvalue, which represents the number of couples of subjects selected by the two groups among which the continuous attribute shows the highest value in the target group. In the U calculation, couples with equal values contribute with 0.5 each. The Normalized Uvalue is the corresponding proportion of couples, thus representing an estimate of the probability of correctly ranking two subjects randomly selected from the two groups. Normalized Uvalue is the equivalent to the AUC area under the ROC curve obtained by the two selected attributes. R_{1}value is the lowest value of the rank sum, calculated inside the two groups. It frequently, but not necessarily, corresponds to the sum of ranks inside the smallest group. R_{1} is also known as the Wilcoxon statistic test and is easily obtained by the following procedure:
Under the null hypothesis of an equal distribution of the continuous attribute inside the two groups, the two corresponding rank sums will be similar, then a very low value of R_{1} will correspond to a high probability of an actual difference between the two distributions. In the absence of coincident values (ties) the MannWhitney test and the Wilcoxon test are equivalent to assess the difference between the median values among the two classes. The test is very powerful in the presence of a “shift” model, i.e. when the two groups share distributions that differ only by a constant value. Conversely, the test in general is unable to identify small local differences between the two distributions (the KolmogorvSmirnov test could be applied instead). The equivalence between the MannWhitney test and the method proposed by Wilcoxon relies on the fact that U is a simple linear transformation of R_{1}:
where n_{1} and n_{2} represent the sample size of the two groups. The corresponding pvalue is obtained by assuming an asymptotic Normal distribution for the U statistic:
where z follows a Normal standard distribution. 
KolmogorovSmirnov test  KS Value and Pvalue for KS test  A nonparametric tool for comparing a continuous distribution between two groups. Similarly to the WilcoxonMannWhitney approach it tests the null hypothesis that the two distributions are equal, looking for any difference inside the two empirical distributions. In particular, in the presence of local differences it tends to be more powerful than the MannWhitney test, whereas when the continuous attribute has a similar distribution among the two groups, but with different median values, the former approach is uniformly more powerful. The KolmogorovSmirnov test is based on the maximum absolute difference between the two empirical distributions S_{n1} and S_{n2} of the values of continuous attribute inside the two groups:
The corresponding pvalue is obtained by the exact cumulative null distribution of n_{1}n_{2}D_{n1,n2}. The KolmogorvSmirnov statistic D_{n1,n2} has an interesting relationship with the ROC curve obtained from the same two attributes, as it corresponds to the highest vertical distance between the curve and the rising diagonal, also corresponding to the highest value of the Youden index. 
Student ttest  Student tvalue and Pvalue for Student ttest  The most popular statistical test for comparing two classes. It is used to assess the null hypothesis that the mean values of the distribution of the continuous attribute inside the two groups are not different. It lies on the assumption that the continuous variable is normally distributed inside each group. The test is available in two different versions. The first default test assumes that the continuous variable has the same variance in the two groups:
where m_{1} and m_{2} represent the means of the two groups, S_{1} and S_{2} the corresponding variance estimates and n_{1 }and n_{2} the number of valid data in each group. The expression inside the square root at the denominator of the equation is usually called “the estimate of the pooled variance”. The Student t test for unequal variances is obtained by the following equation:
In both cases the inference (i.e. the pvalue calculation) is made by assuming that t follows a Student t distribution with n_{1}+n_{2}−2 degrees of freedom. Such an assumption is met when the two samples are independent and the distribution of the continuous attribute within each group is Normal (Gaussian). 
Levene test  Fvalue for Levene test and Pvalue for Levene test  This test is used to compare the variance of the continuous attribute between two groups, using the following equation:
F follows a Fisher F distribution when the two variances are actually equal (i.e. under the null hypothesis) and if there is a Normal distribution of the attribute inside the two groups. The degrees of freedom for the numerator and the denominator corresponds to n_{2}−1 and n_{1}−1 respectively, if s^{2}_{2 }> s^{2}_{1}. If however s^{2}_{1} <= s^{2}_{2 }the degrees of freedom for the numerator and the denominator instead correspond to n_{1}1 and n_{2}1, respectively. 
Parameters  Use target attribute  Selecting target value for … it is possible to choose the target value for each statistical analysis. If the target attribute is dichotomic, the selection will have an effect on the Student t test only, changing the sign of the t statistic; because the corresponding pvalue is evaluated for two sided tests, it will remain unchanged. 
Student ttest parameters  Options  The type of Student t test. Possible types are:
