31.3 Expected values
Assuming that the odds of having most meals off-campus is the same for both groups (that is, the population OR is one), how would the sample OR be expected to vary from sample to sample just because of sampling variation?
If the population OR was one, the odds are the same in both groups; equivalently, the percentages are the same in both groups. That is, the percentage of students eating most meals off-campus is the same for students living with and not living with their parents.
Let’s consider the implication. From Table 31.1, 157 students out of 183 ate most meals off-campus; that is,
\[ \frac{157}{183} \times 100 = 85.79\% \] of the students in the entire sample ate most of their meals off-campus.
If the percentage of students who eat most of their meals off-campus is the same for those who live with their parents and those who don’t, then we’d expect 85.79% of students in both groups to be equal to this value. That is, we would expect
- 85.79% of the 54 students (that is, 46.33) who live with their parents to eat most meals off-campus; and
- 85.79% of the 129 students (that is, 110.67) who don’t live with their parents to eat most meals off-campus.
That is, the percentage (and hence the odds) is the same in each group. Those are the numbers that are expected to appear if the percentage was exactly the same in each group (Table 31.3), if the null hypothesis (the assumption) was true.
Think 31.1 (OR for expected counts) Consider the expected counts in Table 31.3.
Confirm that the odds of having most meals off-campus is the same for students living with their parents, and for students not living with their parents.How do those expected values compare to what was observed? For example:
- 46.33 of the 54 students who live with their parents are expected to eat most meals off-campus; yet we observed 52.
- 110.67 of the 129 students who don’t live with their parents are expected to eat most meals off-campus; yet we observed 105.
The observed and expected counts are similar, but not the exactly same. This is no surprise: each sample will produce slightly different observed counts (sampling variation). The difference between what the observed and expected counts may be explained by sampling variation (that is, the null hypothesis explanation).
When discussing previous hypothesis tests, the sampling distribution of the sample statistic (in this case, the sampling distribution of the sample odds ratio) was described, and this sampling distribution had an approximate normal distribution (whose standard deviation is called the standard error). However, the sampling distribution of the odds ratio is more involved14 so will not be presented.
Lives with parents | Doesn’t live with parents | Total | |
---|---|---|---|
Most off-campus | 46.33 | 110.67 | 157 |
Most on-campus | 7.67 | 18.33 | 26 |
Total | 54.00 | 129.00 | 183 |
For those who wish to know: The logarithm of the sample ORs have an approximate normal distribution, and a standard error.↩︎