15.1 Introduction
In Sect. 14.6, the NHANES data (Centers for Disease Control and Prevention (CDC) 1988–1994) were numerically summarised. The sample mean direct HDL cholesterol concentration was different for smokers (\(\bar{x} = 1.31\)mmol/L) and for non-smokers (\(\bar{x} = 1.39\)mmol/L).
What does this difference between the sample means imply about the population means?
Two reasons could explain why the sample means are different:
The population means are the same. The sample means are different because every sample is likely to be different (each possible sample includes different people), so, sometimes the sample means are different by chance. This is called sampling variation.
Alternatively, the population means are different, and the sample means simply reflect this.
Similarly, in Sect. 14.6 the odds of being diabetic were different for smokers (0.181) and non-smokers (0.084). What does this difference between the sample odds imply about the population odds?
Again, two possible reasons could explain why the sample odds are different:
The population odds are the same. The sample odds are different because every sample is likely to be different (each possible sample includes different people), so sometimes, the sample odds are different by chance. This is called ‘sampling variation.’
Alternatively, the odds are different in the population, and the sample odds simply reflect this.
In both situations (means; odds), the two possible explanations (‘hypotheses’) have special names:
- There is no difference between the population parameters: this is the null hypothesis, or \(H_0\).
- There is a difference between the population parameters; this is the alternative hypothesis, or \(H_1\).
(The word hypothesis just means ‘a possible explanation.’) A decision needs to be made about which of these two explanation is the most likely. However, because a sample is studied, conclusions about the population are never certain.