35.2 Linear equations: A review
An example of a regression equation is
Here, refers to the explanatory variable, refers to the observed response variable, and refers to the predicted values of the response variable.
In general, the equation of a straight line is written as
where and are just numbers. Again, refers to the predicted (not observed) values of .
The numbers and are called regression coefficients, where
- is a number called the intercept. It is the predicted value of when .
- is a number called the slope. It is, on average, how much the value of changes when the value of increases by 1.
We will use software to find the values of and . However, we can roughly guess the values of the intercept by first drawing what looks like a sensible straight line through the data, and determining what that line predicts for the value of when .
A rough guess of the slope can be made using the formula
That is, a guess of the slope is the change in the value of (the ‘rise’) divided by the corresponding change in the value of (the ‘run’).
To demonstrate, consider the scatterplot in Fig. 35.1. I have drawn a sensible line on the graph to capture the relationship (your line may look a bit different). When , the regression line predicts the value of is about to be 2, so is approximately 2.
To guess the slope, use the ‘rise over run’ idea. The animation below may help explain the rise-over-run idea. When the value of increases from 1 to 5 (a change of ), the corresponding values of change from 5 to 17 (a change of ). Then, use the formula:
The value of is about . The regression line is approximately , usually written as
The intercept has the same measurement units as the response variable. For example, with the red-deer data the intercept is measured in ‘grams,’ the measurement units of the molar weight.
The measurement unit for the slope is the ‘measurement units of the response variable,’ per ‘measurement units of the explanatory variable.’ For example, with the red-deer data the slope has the units of ‘grams per year.’
FIGURE 35.1: An example scatterplot
Example 35.2 (Estimating regression parameters) A study (Dunn and Smyth 2018) examined the number of cyclones in the Australian region each year from 1969 to 2005, and the relationship with a climatological index called the Ocean Nino Index (ONI, ); see (Fig. 35.2),
When the value of is zero, the predicted value of is about 12; is about 12. (You may get something slightly different.) Notice that the intercept is the predicted value of when , which is not at the left of the graph.
To guess the value of , use the ‘rise over run’ idea. When is about , the predicted value of is about 17. When is about , the predicted value of is about 8. So when the value of changes by , the value of changes by (a decrease of about 9). Hence, the value of is approximately . (You may get something slightly different.) Notice that the relationship has a negative direction, so the slope must be negative.
Using these guesses of and , the regression line is approximately

FIGURE 35.2: The number of cyclones in the Australian region each year from 1969 to 2005, and the ONI for October, November, December
In this section, we have seen how to understand a linear regression equation, and how an equation can be used to describe a fitted line. The above method gives a very crude guess of the values of the intercept and the slope . In practice, many reasonable lines could be drawn through a scatterplot of data. However, one of those lines is the ‘best fitting line’ in some sense18. Software calculates this ‘line of best fit’ for us.
For those who want to know: The ‘line of best fit’ is the line such that the sum of the squared vertical distances between the observations and the line is as small as possible.↩︎