20.2 Sampling intervals: Known proportion

The possible values of the sample proportions p^ can be described by an approximate normal distribution, as just discussed. This enables the 68–95–99.7 rule to be applied; for example, about 68% of the time with sets of 25 rolls, the sample proportion of even rolls will be between 0.5 give-or-take one standard deviation (that is, give-or-take 0.1). So, about 68% of the time, the proportion of even rolls in a set of 25 rolls will be between:

  • 0.50.1=0.4 and
  • 0.5+0.1=0.6.

Similarly, about 95% of the time, the proportion of even rolls will be between 0.5 give-or-take two standard deviations, or between:

  • 0.5(2×0.1)=0.3 and
  • 0.5+(2×0.1)=0.7.

This interval tell us what values of p^ are likely to be observed in samples of size 25. Most of the time (i.e. approximately 95% of the time), the value of p^ is expected to be between 0.30 and 0.70. (For instance, in the animation above, all ten sets of 25 rolls (or 100%) had a sample proportion betweeen 0.30 and 0.70.)

More formally, the sample proportion p^ is likely to lie within the interval

p±(multiplier×s.e.(p^)), where s.e.(p^) is the standard error of the sample proportion (calculated using Eq. (20.1)). The symbol ‘±’ means ‘plus or minus,’ or ‘give-or-take.’

The multiplier depends on how confident we wish to be that the interval contains the value of p^.

For a 95% interval—the most common level of confidence—the multiplier is approximately 2, based on the 68–95–99.7 rule: Approximately 95% of observations are within two standard deviations of the value of p (the mean of the normal distribution in Fig. 20.1).

That is, the approximate 95% interval is:

p±(2×s.e.(p^)). For a 90% interval, either tables or a computer would be used to find the correct multiplier, since the 68–95–99.7 rule isn’t helpful.

In practice, 95% intervals are the most common, and we’ll use a multiplier of 2 to find an approximate 95% interval when computing the interval without using software. Software can be used for any other percentage interval (or for an exact 95% interval).

In general, higher confidence means wider intervals (Fig. 20.2), since wider intervals are needed to be more certain that the interval contains p^.

To have greater confidence that the interval will include the sample proportion, the interval needs to be wider

FIGURE 20.2: To have greater confidence that the interval will include the sample proportion, the interval needs to be wider