20.9 Example: Female coffee drinkers
A study of 360 female college students in the United States (Kelpin et al. 2018) found that 61 drank coffee daily.
The unknown parameter is \(p\), the population proportion of female college students in the United States that drink coffee daily.
The sample size is \(n = 360\), and the sample proportion of daily coffee drinkers is \(\hat{p} = 61/360 = 0.16944\). Another sample of 360 students from the same population is likely to produce a different sample proportion \(\hat{p}\) of daily coffee drinkers: the sample proportion has sampling variation. The size of this sampling variation is quantified using a standard error; from (20.3):
\[ \text{s.e.}(\hat{p}) = \sqrt{ \frac{ 0.16944 \times (1 - 0.16944)}{360}} = 0.01977. \]
An approximate 95% CI is \(0.1694 \pm (2 \times 0.01977)\), or \(0.1694 \pm 0.03954\). That is, the margin of error is \(0.03954\).
Computing the ‘plus’ and the ‘minus’ bits, the approximate 95% CI is from \(0.1694 - 0.03954 = 0.12986\) to \(0.1694 + 0.03954 = 0.20894\). Round appropriately, the approximate 95% CI is from \(0.130\) to \(0.209\).
The plausible values for \(p\) that may have led to this value of \(\hat{p} = 0.1694\) are between 0.130 and 0.209. (This CI may or may not contain the true proportion \(p\).)
This CI is statistically valid. We cannot comment on the internal validity: we would need details of how the study was conducted.
The CI is externally valid if the sample is simple random sample of some population, and the study is internally valid. The CI is approximately externally valid if the sample is somewhat representative of some population, and the study is internally valid.