Level 2, Volume 1, Quantitative methods, Reading 11, Correlation & Regression
(This post was updated with Step 3 on Dec 11,2010)
3.1 Linear regression with one independent variable
It did take me some time to understand the formulas below but I suspect that it makes the rest of the chapter a lot easier if you understand the basic calculation.
Why is linear regression useful?
Linear regression allows us to use one variable to make predictions about another, test hypotheses about the relation between two variables, and quantify the strength of the relationship between the two variables.
The formula
Yi = b0 + b1xi + ei , i = 1, . . ., n
This equation states that the dependent variable, y, is equal to:
- The intercept, b0, (the point where a line crosses the x-axis)
- Plus a slope coefficient, b1, times the independent variable, x,
- Plus an error term, e. The error term represents the portion of the dependent variable that cannot be explained by the independent variable.
Types of values
You can either use a time value or a cross sectional value in the linear regression.
Time value = many observations from different time periods for same variable: denote t = 1,2, ....t
Cross sectional = cross-sectional data involve many observations on x and y for the same time period: denote i = 1,2,....t
Nb: intercept b0 and the slope coefficient b1 = regression coefficients (nb, it excludes xi)
How do we calculate linear regression graph?
Step1: calculate slope coefficient b1
Using data from previous examples, the slope is calculated as follows:
1. Data | |||||
Year | 2010 | 2009 | 2008 | 2007 | 2006 |
X: Revenue US $m | 52798 | 50211 | 59473 | 47473 | 39099 |
Y: Earnings per ordinary share (diluted) (US sent) | 227.8 | 105.4 | 274.8 | 228.9 | 172.4 |
2. Calculation | |||||
Year | Revenue $ | Dividends | Cross product | Squared deviations Revenue (X) | |
2010 | 52798 | 227.80 | 77,487.97 | 8,923,363.84 | |
2009 | 50211 | 105.40 | -38,603.29 | 160,160.04 | |
2008 | 59473 | 274.80 | 704,760.87 | 93,358,108.84 | |
2007 | 47473 | 228.90 | -63,214.11 | 5,465,308.84 | |
2006 | 39099 | 172.40 | 315,569.63 | 114,742,659.24 | |
Average | 49810.8 | 201.86 | |||
Covariance | Sum | 996,001.06 | 222,649,600.80 | ||
(N-1) | 4.00 | ||||
Answer | 249,000.27 | ||||
Variance | Sum Squared deviations | 222,649,600.80 | |||
(N-1) | 4.00 | ||||
Answer | 55,662,400.20 | ||||
1. Covariance | 249,000.27 | ||||
2. Variance X | 55,662,400.20 | ||||
Answer Slope Coefficient b1 (1/2) | 0.004473 |
Step2: Calculate Interval b0
We calculate b0 based on the fact that in linear regression, the regression line fits through the point corresponding to the means of the dependent and the independent variables.
Using data from above, we calculate b0 as follows:
Revenue $ | Dividends | ||
Average Mean | 49810.8 | 201.86 | |
Formula | Yi = b0 + b1Xi + ei | ||
where | Yi = | 201.86 | |
b1 = | 0.004473 | ||
Xi = | 49810.8 | ||
Answer b0 | -20.96370784 |
Step 3: Calculate Yi based on the regression formula | ||||||
The regression formula is: Yi = b0 + b1Xi + ei | ||||||
Using this formula we can calculate the dependent variable, being Earnings per share |
Year | Actual Earnings (For Info) | Calculated Earnings per regression model (Yi) | bo (calculated in Step 2) | Plus | (B1(calculated in Step 1) | Multiply Xi (Revenue, independent variable)) | ||
2010 | 227.8 | 215.22 | -20.9637 | 0.004473 | 52798 | |||
2009 | 105.4 | 203.65 | -20.9637 | 0.004473 | 50211 | |||
2008 | 274.8 | 245.08 | -20.9637 | 0.004473 | 59473 | |||
2007 | 228.9 | 191.40 | -20.9637 | 0.004473 | 47473 | |||
2006 | 172.4 | 153.94 | -20.9637 | 0.004473 | 39099 |
No comments:
Post a Comment