Shapiro-Wilk Test
Some statistical tests ask you to assume that your data follows a normal distribution. The Shapiro-Wilk Test compares your data to data from a normal distribution with the same mean and standard deviation as your sample.
- You want to test whether your data can be assumed to follow the normal distribution.
- Your data sample is less than 2,000.
- Your data is continuous (see Types of Data for more information).
NOTE: The test is usually recommended for samples of less than 2,000. For larger samples, use the Kolmogorov-Smirnov Goodness of Fit Test.
Some statistical tests ask you to assume that your data follows a normal distribution. The Shapiro-Wilk test compares your data to data from a normal distribution with the same mean and standard deviation as your sample. If the test is NOT significant, then the data can be assumed to be normal.
The Shapiro-Wilk test has certain assumptions that need to be met for accurate results:
- The data set should be independent.
- The data set should be continuous.
- The data set should not have outliers.
- The data set should not have significant skewness or long "tails" (kurtosis). See Measures of Central Tendency for more information.
The hypotheses for the test are:
H0: the data follows the normal distribution.
H1: the data does not follow the normal distribution.
The easiest way to do this test is to use an online calculator. A good example can be found HERE.
Data was collected using the method outlined in Microplastics Data Analysis.
Two data sets were collected on the density of microplastics in the Pacific and Atlantic Oceans (pieces per m3), as shown below.
Density of Microplastics | |
Pacific Ocean | Atlantic Ocean |
pieces/m3 | pieces/m3 |
1.84839 | 0.01268 |
0.83627 | 0.01925 |
3.22369 | 0.03866 |
1.31768 | 0.09834 |
1.36212 | 0.12738 |
26.85479 | 0.04648 |
22.41620 | 0.52381 |
8.04915 | 0.23810 |
4.05439 | 0.41302 |
11.41689 | 0.04575 |
51.07820 | 1.09462 |
4.48603 | 0.14607 |
5.48117 | 0.25618 |
0.45179 | 0.05735 |
0.33731 | 0.03978 |
0.26649 | 0.11640 |
0.45179 | 0.15359 |
0.69285 | 0.05181 |
1.00920 | 0.62465 |
0.42390 | 0.48175 |
0.69286 | 0.03390 |
NOTE: This data is also available in this spreadsheet.
Using the online calculator HERE, enter the first column of data (Pacific Ocean).
It should look like this...
The results tell us that there is a significant departure from normality, W(21) = .59, p < .001.
Remember our rule: "If p is low, H0 must go". So, since the calculated p-value is less than 5% (0.05), the null hypothesis (that our data is normally distributed) must be rejected.
Running the test again for the Atlantic Ocean data also shows a significant departure from normality.
This means that we cannot continue our analysis using a t-Test as our data cannot be assumed to be normally distributed. We could, however, use a Mann Whitney U-Test.
Your Turn
An ecologist was investigating woodland microhabitats, contrasting the communities in a shaded position with those in full light. One of the plants was ivy (Hedera helix). Leaf widths were measured, but because the size of the leaves varied with the position on the plant, only the 4th leaf from each stem tip was measured. The results from the plants available were as follows.
Width of sunny leaves (mm): 32, 24, 30, 33, 61, 26, 32, 37, 43, 31, 38, 26
Width of shady leaves (mm): 34, 16, 45, 41, 36, 33, 37, 42, 35, 35, 36, 36
You now wish to check whether it is reasonable to assume that your data is normally distributed.
First test your sunny leaf data using this online calculator HERE.
What do you conclude? Tick all that apply.
You now know that you cannot assume your "sunny" data is normally distributed as it has a positive skew.
Now test your shady leaf data using the online calculator.
What do you conclude?
It is not reasonable to assume that your data is normally distributed |
Your calculated p-value = 0.006994 which is less than the critical probability or significance level of 0.05 (5%). Therefore we cannot reasonably assume the data is normally distributed.
EXTENSION QUESTION
Because you cannot assume your data is normally distributed, which statistical tests could you do to compare the sunny and shady data points? Tick all that apply.
The Unpaired t-Test assumes the data follows a normal distribution so you cannot use this test.
You cannot do a Wilcoxon Rank test as this test is for repeated sample groups.
BUT you can do a Mann-Whitney Test.