Wednesday, 8 December 2010

Quantitative Methods, Introduction to Linear Regression

Level 2, Volume 1, Quantitative methods, Reading 11, Correlation & Regression


(This post was updated with Step 3 on Dec 11,2010)

3.1 Linear regression with one independent variable
 Nb - the learning outcome does not require one to calculate linear regression, but only to distinguish between the dependent and independent variables in a linear regression. 
It did take me some time to understand the formulas below but I suspect that it makes the rest of the chapter a lot easier if you understand the basic calculation.

Why is linear regression useful?
Linear regression allows us to use one variable to make predictions about another, test hypotheses about the relation between two variables, and quantify the strength of the relationship between the two variables.

The formula
Yi = b0 + b1xi + ei ,   i = 1, . . ., n
This equation states that the dependent variable, y, is equal to:
  • The intercept, b0, (the point where a line crosses the x-axis)
  • Plus a slope coefficient, b1, times the independent variable, x,
  • Plus an error term, e. The error term represents the portion of the dependent variable that cannot be explained by the independent variable.

Types of values
You can either use a time value or a cross sectional value in the linear regression.
Time value = many observations from different time periods for same variable: denote t = 1,2, ....t
Cross sectional = cross-sectional data involve many observations on x and y for the same time period: denote i = 1,2,....t

Nb: intercept b0 and the slope coefficient b1 = regression coefficients (nb, it excludes xi)

How do we calculate linear regression graph?
Step1: calculate slope coefficient b1
 The formula for the slope coefficient is Covariance (X,Y) / Variance (X)

Using data from previous examples, the slope is calculated as follows:
1. Data
Year
2010
2009
2008
2007
2006
X: Revenue US $m
52798
50211
59473
47473
39099
Y: Earnings per ordinary share (diluted) (US sent)
227.8
105.4
274.8
228.9
172.4

2. Calculation
Year
Revenue $
Dividends
Cross product
Squared deviations Revenue (X)
2010
52798
227.80
         77,487.97      
    8,923,363.84
2009
50211
105.40
       -38,603.29
       160,160.04
2008
59473
274.80         
      704,760.87
  93,358,108.84
2007
47473
228.90
       -63,214.11
    5,465,308.84
2006
39099
172.40
       315,569.63
114,742,659.24
Average
49810.8
201.86
Covariance
Sum
      996,001.06
   222,649,600.80
(N-1)
                 4.00
Answer
     249,000.27
Variance
Sum Squared deviations
222,649,600.80
(N-1)
                 4.00
Answer
     55,662,400.20
1. Covariance
      249,000.27
2. Variance X
55,662,400.20       
Answer Slope Coefficient  b1  (1/2)
         0.004473


Step2: Calculate Interval b0
We calculate b based on the fact that in linear regression, the regression line fits through the point corresponding to the means of the dependent and the independent variables.
Using data from above, we calculate b0 as follows:
Revenue $
Dividends
Average Mean
49810.8
201.86
Formula
Yi = b0 + b1Xi + ei
where
Yi =
201.86
b1 =
0.004473
Xi =
49810.8
Answer b0
-20.96370784


Step 3: Calculate Yi based on the regression formula
The regression formula is: Yi = b0 + b1Xi + ei
Using this formula we can calculate the dependent variable, being Earnings per share


Year
Actual Earnings (For Info)
Calculated Earnings per regression model (Yi)
bo (calculated in Step 2)
Plus
(B1(calculated in Step 1)
Multiply Xi (Revenue, independent variable))
2010
227.8
215.22
-20.9637
0.004473
52798
2009
105.4
203.65
-20.9637
0.004473
50211
2008
274.8
245.08
-20.9637
0.004473
59473
2007
228.9
191.40
-20.9637
0.004473
47473
2006
172.4
153.94
-20.9637
0.004473
39099








No comments:

Post a Comment