You are here: Home Descriptive statistics, assessment of distribution Assessment of the distribution in SPSS
07 - 11 - 2014
 Save \$25 on orders of \$100+, \$50 on orders of \$150+, and save \$100 on orders of \$250+ by entering code AFF2014 at store.elsevier.com!  ## Assessment of the distribution in SPSS

Next step in statistical analysis after calculation of descriptive statistics is assessment of the distribution.

First, split the file into subsets (see Splitting the file), then do the following steps:

1)  In the Data View click the Analyze menu, point to Descriptive Statistics, and select Explore… : The Explore dialog box opens: 2) Select the variables for the analysis: “Tea tree”, “Tea tree + Gati”, “Thyme” and “Thyme + Gati”.

3) Click the upper transfer arrow button . The selected variables are moved to the Dependent List: list box.

4) Click the Plots… button in order to choose charts; the Explore: Plots dialog box opens: 5) Distribution is better assessed in histograms and normality tests with plots. In Descriptive section select Histogram check box.

6) Select Normality plots with tests. Other options leave selected by default.

7) Click the Continue button. This returns you to the Explore dialog box.

8) Click the OK button. An Output Viewer window opens and displays statistics and plots.

We can assess distribution simply by looking at histograms which show the frequency of each value. Presented histograms more or less resemble bell-shaped curves.

Except histograms, normality of distribution can be assessed in the Q-Q plot, which is a variant of a probability plot. If the inspected data have a normal frequency distribution, a plot of data against the expected statistics should produce a straight line (which is shown as a diagonal line on each chart). If the observed points curve is above or below the normal plot line, this indicates the kurtosis departs from a normal distribution, whereas if the observed plot is S-shaped, this shows that the data are skewed.

At all of the presented Q-Q plots experimental observations are located near the reference diagonal line. Therefore, both types of charts indicate that most probably the distribution of data in each subset is normal.

The histograms and the Q-Q plots represent a visual approach to the assessment of the distribution, which is not as precise as the presentation of results in the numerical view. Visual evaluation is usually supported by performing statistical tests for the assessment of distribution, such as Shapiro-Wilk and Kolmogorov-Smirnov tests. It is recommended to use Shapiro-Wilk test if the sample size is between 3 and 2000, and Kolmogorov-Smirnov test if the sample size is greater than 2000. Unfortunately, in some circumstances, both tests can produce misleading results, so it is better to use them together with graphical plots.

Below are results of the normality tests for the diameters of inhibition zones around the disks with tea tree oil against E. coli and E. faecalis, respectively. “Significance” column (P-value) is used for the interpretation. From these tables we can see that all deviations from normal distribution are not statistically significant. Therefore, distribution of values in all groups is normal and, therefore, parametric tests should be applied for any comparison between groups.  Assessment of the distribution was also performed for the second example on MICs of oils in order to understand which statistical tests should be used for the comparison of groups in this example.

The histogram for MICs of tea tree oil alone against E. coli is far from bell-shaped normal curve and dots for experimental data on the Q-Q plot also are located far from straight line, this shows that distribution is non-normal. The histogram for MICs of tea tree oil in the presence of gatifloxacin also does not look as the normal curve; the same we can see on the Q-Q plot: it has S-shaped form which indicates skewness of data. The histogram for MICs of tea tree oil for E. faecalis is closer to the normal curve, and the Q-Q plot proves this: observed values almost correspond expected normal values. The histogram for MICs of tea tree oil against E. faecalis in the presence of gatifloxacin again is far from histogram characteristic of normal distribution, and on the Q-Q plot the observed values are located not on the reference line.

From the tables with normality tests we can see that the distribution of all data for E. coli is significantly different from normal (p<0.05 for variables “Tea tree” and “Tea tree + Gati” by the both tests). However, the distribution of MICs of tea tree oil without gatifloxacin against E. faecalis is not significantly different from normal one (p = 0.118).  The same analysis was also performed for thyme oil. For both species the distribution of MICs is non-normal for thyme oil with and without gatifloxacin.

When all compared groups have normal distribution, parametric tests are used for their comparison. However, if at least one of compared groups have non-normal distribution, like in both our examples, non-parametric tests are appropriate.  