Reading 12: Multiple Regression and Issues in Regression Analy

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

17^#

发表于 2011-3-3 14:31 | 只看该作者

John Rains, CFA, is a professor of finance at a large university located in the Eastern United States. He is actively involved with his local chapter of the Society of Financial Analysts. Recently, he was asked to teach one session of a Society-sponsored CFA review course, specifically teaching the class addressing the topic of quantitative analysis. Based upon his familiarity with the CFA exam, he decides that the first part of the session should be a review of the basic elements of quantitative analysis, such as hypothesis testing, regression and multiple regression analysis. He would like to devote the second half of the review session to the practical application of the topics he covered in the first half.

Rains decides to construct a sample regression analysis case study for his students in order to demonstrate a “real-life” application of the concepts. He begins by compiling financial information on a fictitious company called Big Rig, Inc. According to the case study, Big Rig is the primary producer of the equipment used in the exploration for and drilling of new oil and gas wells in the United States. Rains has based the information in the problem on an actual equity holding in his personal portfolio, but has simplified the data for the purposes of the review course.

Rains constructs a basic regression model for Big Rig in order to estimate its profitability (in millions), using two independent variables: the number of new wells drilled in the U.S. (WLS) and the number of new competitors (COMP) entering the market:

Profits = b₀ + b₁WLS – b₂COMP + ε

Based on the model, the estimated regression equation is:

Profits = 22.5 + 0.98(WLS) ? 0.35(COMP)

Using the past 5 years of quarterly data, he calculated the following regression estimates for Big Rig, Inc:

Coefficient

Standard Error

Intercept

22.5

2.465

WLS

0.98

0.683

COMP

0.35

0.186

Using the information presented, the t-statistic for the number of new competitors (COMP) coefficient is:

1.882.

1.435.

9.128.

To test whether a coefficient is statistically significant, the null hypothesis is that the slope coefficient is zero. The t-statistic for the COMP coefficient is calculated as follows:

(0.35 – 0.0) / 0.186 = 1.882

(Study Session 3, LOS 11.g)

Rains asks his students to test the null hypothesis that states for every new well drilled, profits will be increased by the given multiple of the coefficient, all other factors remaining constant. The appropriate hypotheses for this two-tailed test can best be stated as:

A)	H₀: b₁ ≤ 0.98 versus H_a: b₁ > 0.98.

B)	H₀: b₁ = 0.98 versus H_a: b₁ ≠ 0.98.

C)	H₀: b₁ = 0.35 versus H_a: b₁ ≠ 0.35.

The coefficient given in the above table for the number of new wells drilled (WLS) is 0.98. The hypothesis should test to see whether the coefficient is indeed equal to 0.98 or is equal to some other value. Note that hypotheses with the “greater than” or “less than” symbol are used with one-tailed tests. (Study Session 3, LOS 11.g)

Continuing with the analysis of Big Rig, Rains asks his students to calculate the mean squared error(MSE). Assume that the sum of squared errors (SSE) for the regression model is 359.

21.118.

17.956.

18.896.

The MSE is calculated as SSE / (n – k – 1). Recall that there are twenty observations and two independent variables. Therefore, the SEE in this instance = 359 / (20 – 2 ? 1) = 21.118. (Study Session 3, LOS 11.i)

Rains now wants to test the students’ knowledge of the use of the F-test and the interpretation of the F-statistic. Which of the following statements regarding the F-test and the F-statistic is the most correct?

A)	The F-test is usually formulated as a two-tailed test.

B)	The F-statistic is almost always formulated to test each independent variable separately, in order to identify which variable is the most statistically significant.

C)	The F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.

An F-test assesses how well a set of impendent variables, as a group, explains the variation in the dependent variable. It tests all independent variables as a group, and is always a one-tailed test. The decision rule is to reject the null hypothesis if the calculated F-value is greater than the critical F-value. (Study Session 3, LOS 11.i)

One of the main assumptions of a multiple regression model is that the variance of the residuals is constant across all observations in the sample. A violation of the assumption is known as:

A)	robust standard errors.

B)	positive serial correlation.

C)	heteroskedasticity.

Heteroskedasticity is present when the variance of the residuals is not the same across all observations in the sample, and there are sub-samples that are more spread out than the rest of the sample. (Study Session 3, LOS 12.i)

Rains reminds his students that a common condition that can distort the results of a regression analysis is referred to as serial correlation. The presence of serial correlation can be detected through the use of:

A)	the Breusch-Pagen test.

B)	the Durbin-Watson statistic.

C)	the Hansen method.

The Durbin-Watson test (DW ≈ 2(1 ? r)) can detect serial correlation. Another commonly used method is to visually inspect a scatter plot of residuals over time. The Hansen method does not detect serial correlation, but can be used to remedy the situation. Note that the Breusch-Pagen test is used to detect heteroskedasticity. (Study Session 3, LOS 12.i)

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

16^#

发表于 2011-3-3 14:30 | 只看该作者

An analyst is estimating whether company sales is related to three economic variables. The regression exhibits conditional heteroskedasticity, serial correlation, and multicollinearity. The analyst uses Hansen’s procedure to adjust for the standard errors. Which of the following is most accurate? The:

A)	regression will still exhibit heteroskedasticity and multicollinearity, but the serial correlation problem will be solved.

B)	regression will still exhibit serial correlation and multicollinearity, but the heteroskedasticity problem will be solved.

C)	regression will still exhibit multicollinearity, but the heteroskedasticity and serial correlation problems will be solved.

The Hansen procedure simultaneously solves for heteroskedasticity and serial correlation.

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

15^#

发表于 2011-3-3 14:30 | 只看该作者

Which of the following is least likely a method of detecting serial correlations?

A)	The Durbin-Watson test.

B)	A scatter plot of the residuals over time.

C)	The Breusch-Pagan test.

The Breusch-Pagan test is a test of the heteroskedasticity and not of serial correlation.

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

14^#

发表于 2011-3-3 14:30 | 只看该作者

Which of the following is least accurate regarding the Durbin-Watson (DW) test statistic?

A)	If the residuals have negative serial correlation, the DW statistic will be greater than 2.

B)	In tests of serial correlation using the DW statistic, there is a rejection region, a region over which the test can fail to reject the null, and an inconclusive region.

C)	If the residuals have positive serial correlation, the DW statistic will be greater than 2.

A value of 2 indicates no correlation, a value greater than 2 indicates negative correlation, and a value less than 2 indicates a positive correlation. There is a range of values in which the DW test is inconclusive.

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

13^#

发表于 2011-3-3 14:30 | 只看该作者

An analyst is estimating whether a fund’s excess return for a month is dependent on interest rates and whether the S& 500 has increased or decreased during the month. The analyst collects 90 monthly return premia (the return on the fund minus the return on the S& 500 benchmark), 90 monthly interest rates, and 90 monthly S& 500 index returns from July 1999 to December 2006. After estimating the regression equation, the analyst finds that the correlation between the regressions residuals from one period and the residuals from the previous period is 0.199. Which of the following is most accurate at a 0.05 level of significance, based solely on the information provided? The analyst:

A)	cannot conclude that the regression exhibits either serial correlation or multicollinearity.

B)	can conclude that the regression exhibits multicollinearity, but cannot conclude that the regression exhibits serial correlation.

C)	can conclude that the regression exhibits serial correlation, but cannot conclude that the regression exhibits multicollinearity.

The Durbin-Watson statistic tests for serial correlation. For large samples, the Durbin-Watson statistic is approximately equal to two multiplied by the difference between one and the sample correlation between the regressions residuals from one period and the residuals from the previous period, which is 2 × (1 ? 0.199) = 1.602, which is less than the lower Durbin-Watson value (with 2 variables and 90 observations) of 1.61. That means the hypothesis of no serial correlation is rejected. There is no information on whether the regression exhibits multicollinearity.

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

12^#

发表于 2011-3-3 14:30 | 只看该作者

An analyst is estimating whether a fund’s excess return for a month is dependent on interest rates and whether the S& 500 has increased or decreased during the month. The analyst collects 90 monthly return premia (the return on the fund minus the return on the S& 500 benchmark), 90 monthly interest rates, and 90 monthly S& 500 index returns from July 1999 to December 2006. After estimating the regression equation, the analyst finds that the correlation between the regressions residuals from one period and the residuals from the previous period is 0.145. Which of the following is most accurate at a 0.05 level of significance, based solely on the information provided? The analyst:

A)	cannot conclude that the regression exhibits either serial correlation or heteroskedasticity.

B)	can conclude that the regression exhibits serial correlation, but cannot conclude that the regression exhibits heteroskedasticity.

C)	can conclude that the regression exhibits heteroskedasticity, but cannot conclude that the regression exhibits serial correlation.

The Durbin-Watson statistic tests for serial correlation. For large samples, the Durbin-Watson statistic is equal to two multiplied by the difference between one and the sample correlation between the regressions residuals from one period and the residuals from the previous period, which is 2 × (1 ? 0.145) = 1.71, which is higher than the upper Durbin-Watson value (with 2 variables and 90 observations) of 1.70. That means the hypothesis of no serial correlation cannot be rejected. There is no information on whether the regression exhibits heteroskedasticity.

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

11^#

发表于 2011-3-3 14:29 | 只看该作者

During the course of a multiple regression analysis, an analyst has observed several items that she believes may render incorrect conclusions. For example, the coefficient standard errors are too small, although the estimated coefficients are accurate. She believes that these small standard error terms will result in the computed t-statistics being too big, resulting in too many Type I errors. The analyst has most likely observed which of the following assumption violations in her regression analysis?

A)	Multicollinearity.

B)	Homoskedasticity.

C)	Positive serial correlation.

Positive serial correlation is the condition where a positive regression error in one time period increases the likelihood of having a positive regression error in the next time period. The residual terms are correlated with one another, leading to coefficient error terms that are too small.

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

10^#

发表于 2011-3-3 14:29 | 只看该作者

Alex Wade, CFA, is analyzing the result of a regression analysis comparing the performance of gold stocks versus a broad equity market index. Wade believes that serial correlation may be present, and in order to prove his theory, should use which of the following methods to detect its presence?

A)	The Breusch-Pagan test.

B)	The Durbin-Watson statistic.

C)	The Hansen method.

The Durbin-Watson statistic is the most commonly used method for the detection of serial correlation, although residual plots can also be utilized. For a large sample size, DW ≈ 2(1-r), where r is the correlation coefficient between residuals from one period and those from a previous period. The DW statistic is then compared to a table of DW statistics that gives upper and lower critical values for various sample sizes, levels of significance and numbers of degrees of freedom to detect the presence or absence of serial correlation.