Level 2, Volume 1, Quantitative Methods, Reading 12, Multiple Regression
The references refer to the CFA text book.
LOS 12 a) to c) is fairly integrated, but is fairly straight forward if one understands the principles of Reading 11. These are discussed in an example from a previous blog.
The regression statistics, noted below, are used to illustrate the learning outcomes.
(Included a link to a blog explaining how to set up Excel to create your own regression tables.)
a.1) formulate a multiple regression equation to describe the relation between a dependent variable and several independent variables
The multiple regression equation is similar to that for an equation using only one independent variable.
The only differences is that the instead of one independent variable there are more than one,
being b1X1i + b2X2i etc.
In the BHP example above the dependent variable is the BHP share price, with two independent variables being selected, being Revenue US$ per share and Net Operating cashflow per share. (There is a strong argument to be made that these two measures have a linear relationship, contravening one of the assumptions of the model. This is for now ignored.)
The Coefficients from the Summary output indicates the following:
Intercept = 27.321
Revenue US $m per share = -4.269
Net operating cashflow per share = 15.828
The multiple regression equation is therefore:
Y = 27.321 -4.269(X1) + 15.828(X2)
a.2) determine the statistical significance of each independent variable
As you might recall a result is called statistically significant if it is unlikely to have occurred by chance.
It is suggested that one first consider the overall significance of the regression, before considering the coefficients.
This is done by reviewing the F score and Significance F.
The entry of 0.51 for the significance of F means that the regression is significant at the 0.51 level.
This implies that there is a high likelihood that the result occurred by chance.
Although it is suggested that one does not proceed (as the F score already suggested that the event occurred by chance) the individual coefficients also support the F score conclusion.
The t-statistic is used to determine the statistical significance.
a.3) interpret the estimated coefficients and their p-values.
The t-statistic for the coefficients are relatively low, suggesting that it is highly likely that the null hypothesis will be accepted. To evaluate the significance of the t-statistic we need to determine a quantity called degrees of freedom. This has been discussed in previous readings.
The p-values provide one with level of significance at which the null hypothesis will be rejected. It has the benefit that it gives the reader to decide whether he or she is willing to accept the level of significance. A level of 95% (and a p value of 0.05) is generally accepted as the norm.
You will note that the p-values for the coefficients are all in excess of 0.33, indicating a great likelihood that the results are by chance.
b.1) formulate a null and an alternative hypothesis about the population value of a regression coefficient
H0: b1X1 = 0
Ha: b1X1 <>0
AND
H0: b2X2 = 0
Ha: b2X2 <>0
Using a two tailed test!
b.2) calculate the value of the test statistic
We will assume that we are testing at 95% level, for a two tailed test.
Therefore, with a df of 2 (5 Observations – 3), using the tables, we obtain a value of 4.303.
b.3) determine whether to reject the null hypothesis at a given level of significance by using a one-tailed or two-tailed test
The test statistic has been calculated as 4.303.
To accept the null hypothesis, the t stat should be less than 4.303 or -4.303.
All the coefficients fit into this range, indicating that we can accept the null hypothesis.
b.4) interpret the results of the test
At a 95% level we can not accept that the results are not the result of chance.
c.1) calculate and interpret a confidence interval for the population value of a regression coefficient and
The output has already calculated this. Specifically for the Net operating cashflow per share, at the 95% level, it has been calculated as -39.189 to 70.845.
This interval level is derived as follows:
15.828 +/- (4.303)(12.797) = -39.189 and 70.845
c.2) calculate and interpret a predicted value for the dependent variable, given an estimated regression model and assumed values for the independent variables.
We make use of the estimated regression model, being
Y = 27.321 -4.269(X1) + 15.828(X2)
Let’s assume that revenue per share will increase by 10% on 2010 numbers ($10.44), and net cash flow per share by 20% ($3.86).
Using this, the formula will be
Y = 27.321 -4.269(10.44) + 15.828(3.86)
Y = 43.84 predicted value for the share price.
It is VERY IMPORTANT TO REMEMBER THAT THE MODEL IS BUILD ON THE ASSUMPTION THAT THE SECOND INDEPENDENT VARIALBE REMAINS CONSTANT
In the reading on correlation and regression, we presented procedures for constructing a prediction interval for linear regression with one independent variable.
For multiple regression, however, computing a prediction interval to properly incorporate both types of uncertainty requires matrix algebra, which is outside the scope of this reading.
No comments:
Post a Comment