20.7 Summary: Finding a CI for \(p\)
The procedure for computing a confidence interval (CI) for a proportion is:
- Compute the sample proportion, \(\hat{p}\), and identify the sample size \(n\).
- Compute the standard error, which quantifies how much the value of \(\hat{p}\) varies from one sample to the next:
\[ \text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1-\hat{p})}{n}}. \]
- Find the multiplier: this is \(2\) for an approximate 95% CI using the 68–95–99.7 rule. (Note: (Multiplier\(\times\)standard error) is called the margin of error.)
- Compute:
\[ \hat{p} \pm \left( \text{Multiplier}\times\text{standard error} \right). \]
- Check the statistical validity conditions are satisfied.
Example 20.7 (NHANES data) For the NHANES data, first seen in Sect. 12.10, the unknown parameter is \(p\), the population proportion of Americans that currently smoke.
In the study, 1466 out of the 3211 respondents who reported their smoking status said they currently smoked: \(\hat{p}= 1466\div 3211 = 0.4566\).
What is the population proportion \(p\) that currently smoke? We don’t know, and the estimate of \(p\) from every sample is likely to be different. The standard error is \(\text{s.e.}(\hat{p}) = 0.00879\), so the approximate 95% CI for \(p\) is \(0.4566\pm 0.01758\), or from 0.439 to 0.474. (Check the calculations!)
For the conclusions to be statistically valid, the number of smokers must exceed 5, and the number of non-smokers must exceed 5. Both are true. The CI appears to be statistically valid.
We write:
Based on the sample, we are approximately 95% confident that the interval from from 0.429 to 0.474 straddles the population proportion of smokers in the USA.