Practising Data & Stats Skills - Salicylic Acid (Answers)

This page provides answers to the activity allowing you to practice your data and statistics skills.
1. Draw an appropriate graph of the raw data.
The goal is to depict your data visually so you can see what it is revealing at each concentration level and between concentration levels. It is VERY hard to see this when your data is just numbers in a table.
One option would be a box-and-whisker plot. The graph shown below has been produced using this online plot generator (Google Sheets is unable to easily produce this type of graph). You could also use Excel.
It's not perfect (ideally the different acid concentrations would be indicated with labels alongside the axis) but it's not bad and does allow you to see what is happening at each concentration level and between levels. It also reveals that there are some potential outliers.

2. Explain what your graph reveals about the raw data.
We can see from the graph that some seeds did not germinate − that's why we planted 15 seeds for each concentration! We could compare our germination rates with the rates provided by the seed supplier.
These zero values should be excluded from our further calculations as the student is investigating whether the growth of the seeds that germinate differs by acid concentration levels.
Including the zero values would mean that the average growth calculations would depend in part upon the proportion of seeds that germinate. You could phrase your research question to compare the proportion of germinating seeds among treatments OR to compare the growth of the seeds. By including non-germinating seeds in your data you are trying to answer both these questions.
This strategy - and the justification for it - would need to be explained fully in an IA.
We can also see from the graph that some seeds grew only a little bit. Are these outliers? No (unless there is an exceptional reason why they occurred − eg you forgot to water that particular plant for 10 days), but we'd need to explain why those values were observed (maybe they are well within the bounds of reasonableness − nature has variance!).
Again, your chosen strategy for the potential outliers - and the justification for it - would need to be explained fully in an IA.
We can also see that, in general, as acid concentration increases, so does sprout length. This is what we will be investigating further.
3. Calculate the mean and standard deviation for each concentration group.
NOTE: Values of 0 should be deleted from the data set before calculating the mean and standard deviation.
Concentration of salicylic acid | Shoot growth (cm) after 10 days | |
Mean | Std dev | |
0.0% | 3.1 | 0.5 |
0.5% | 4.2 | 0.3 |
1.0% | 5.0 | 0.8 |
1.5% | 5.4 | 0.8 |
2.0% | 6.0 | 1.5 |
4. This processed data can be graphed as shown below. Explain what this graph tells you about the data.
- The mean sprout length increases as acid concentration increases.
- The acid concentration of 2.0% had the highest standard deviation of 1.5. The acid concentration of 0.5% had the lowest standard deviation of 0.3.
- We should continue to investigate the relationship between the two variables.
5. Describe how you would proceed from here to analyse the data further. Consider:
- What is the goal − to compare variables or find a relationship between variables?
- What is the most appropriate statistical test?
- What would be the testing hypotheses?
The next step would be to investigate the relationship between acid concentration and shoot growth using regression.
The first step is to graph these two variables to decide if the relationship is linear or monotonic, non-linear. This impacts whether we then use a Pearson Correlation Coefficient or a Spearmans Rank Coefficient. You can test this by adding a trendline to the graph above.
You need to see whether a linear trendline or an exponential/polynomial trendline fits the data better. You can decide this by noting the coefficient of determination (R-squared) for each type of line.
For a reminder of the meaning of the coefficient of determination (R-squared) go here... Regression.
Both graphs are shown below.


The linear trendline (top graph) has a slightly lower R-squared value (0.9661) than the polynomial trendline (0.9937) (bottom graph).
However, we need to also THINK − which sort of trendline makes sense?
A linear trendline would imply that we could just keep increasing the acid concentration for unlimited growth! That doesn't make intuitive sense. Rather, it makes more sense that growth would reach some upper limit as implied by the polynomial trendline.
So let's use a polynomial trendline.
The polynomial trendline rises from the left to the right of the graph (that is, it is monotonic), but in a non-linear way. This means it is a non-linear, monotonic trendline.
So... a non-linear, monotonic trendline means that we proceed with Spearman Rank Correlation Coefficient (Spearman Rank Correlation) using ALL the raw data points, but excluding the zero values.
Our suggested online calculator for this is HERE.
Running the data through gives the following results: The value of Rs is: 0.77062.
We can continue on and calculate whether this result is statistically significant. Our testing hypotheses for this are:
H0: Acid concentration and sprout length are independent (ie not correlated).
H1: Acid concentration and sprout length are not independent (ie are correlated).
The calculator tells us the p-value is less than 0.001. Therefore we reject the null hypothesis.
We would report this as...
A Spearman rank correlation coefficient was computed to assess the relationship between acid concentration and sprout length. The results indicated that Rs = .771 showing that there is a significant large positive relationship between acid concentration and sprout length. Further, this relationship is statistically significant (r(68) = .771, p<.001).
This result is statistically significant as the calculated p-value is less than 5%.
This means we can be 95% confident that if we repeated the experiment, we would get the same positive correlation.
This result indicates that we can conclude that as acid concentration increases, sprout length also increases. [Now relate this back to your research question…]