Board logo

标题: Reading 12: Multiple Regression and Issues in Regression Analy [打印本页]

作者: 土豆妮    时间: 2011-3-3 14:22     标题: [2011]Session 3Reading 12: Multiple Regression and Issues in Regression Analy

Session 3: Quantitative Methods for Valuation
Reading 12: Multiple Regression and Issues in Regression Analysis

LOS i: Discuss the types of heteroskedasticity and the effects of heteroskedasticity and serial correlation on statistical inference.

 

 

An analyst is trying to estimate the beta for a fund. The analyst estimates a regression equation in which the fund returns are the dependent variable and the Wilshire 5000 is the independent variable, using monthly data over the past five years. The analyst finds that the correlation between the square of the residuals of the regression and the Wilshire 5000 is 0.2. Which of the following is most accurate, assuming a 0.05 level of significance? There is:

A)
no evidence that there is conditional heteroskedasticity or serial correlation in the regression equation.
B)
evidence of serial correlation but not conditional heteroskedasticity in the regression equation.
C)
evidence of conditional heteroskedasticity but not serial correlation in the regression equation.


 

The test for conditional heteroskedasticity involves regressing the square of the residuals on the independent variables of the regression and creating a test statistic that is n × R2, where n is the number of observations and R2 is from the squared-residual regression. The test statistic is distributed with a chi-squared distribution with the number of degrees of freedom equal to the number of independent variables. For a single variable, the R2 will be equal to the square of the correlation; so in this case, the test statistic is 60 × 0.22 = 2.4, which is less than the chi-squared value (with one degree of freedom) of 3.84 for a p-value of 0.05. There is no indication about serial correlation.


作者: 土豆妮    时间: 2011-3-3 14:22

Which of the following is least likely a method used to detect heteroskedasticity?

A)
Durbin-Watson test.
B)
Test of the variances.
C)
Breusch-Pagan test.


The Durbin-Watson test is used to detect serial correlation. The Breusch-Pagan test is used to detect heteroskedasticity.


作者: 土豆妮    时间: 2011-3-3 14:22

Consider the following graph of residuals and the regression line from a time-series regression:

These residuals exhibit the regression problem of:

<

A)
heteroskedasticity.
B)
autocorrelation.
C)
homoskedasticity.


The residuals appear to be from two different distributions over time; in the earlier periods, the model fits rather well compared to the later periods.


作者: 土豆妮    时间: 2011-3-3 14:23

Which of the following statements regarding heteroskedasticity is least accurate?

A)
Heteroskedasticity only occurs in cross-sectional regressions.
B)
Multicollinearity is a potential problem only in multiple regressions, not simple regressions.
C)
The presence of heteroskedastic error terms results in a variance of the residuals that is too large.


If there are shifting regimes in a time-series (e.g., change in regulation, economic environment), it is possible to have heteroskedasticity in a time-series.


作者: 土豆妮    时间: 2011-3-3 14:23

Which of the following statements regarding heteroskedasticity is least accurate?

A)
The assumption of linear regression is that the residuals are heteroskedastic.
B)
Heteroskedasticity results in an estimated variance that is too large and, therefore, affects statistical inference.
C)
Heteroskedasticity may occur in cross-section or time-series analyses.


The assumption of regression is that the residuals are homoskedastic (i.e., the residuals are drawn from the same distribution).


作者: 土豆妮    时间: 2011-3-3 14:23

Which of the following conditions will least likely affect the statistical inference about regression parameters by itself?

A)
Unconditional heteroskedasticity.
B)
Multicollinearity.
C)
Conditional heteroskedasticity.


Unconditional heteroskedasticity does not impact the statistical inference concerning the parameters.


作者: 土豆妮    时间: 2011-3-3 14:28

George Smith, an analyst with Great Lakes Investments, has created a comprehensive report on the pharmaceutical industry at the request of his boss. The Great Lakes portfolio currently has a significant exposure to the pharmaceuticals industry through its large equity position in the top two pharmaceutical manufacturers. His boss requested that Smith determine a way to accurately forecast pharmaceutical sales in order for Great Lakes to identify further investment opportunities in the industry as well as to minimize their exposure to downturns in the market. Smith realized that there are many factors that could possibly have an impact on sales, and he must identify a method that can quantify their effect. Smith used a multiple regression analysis with five independent variables to predict industry sales. His goal is to not only identify relationships that are statistically significant, but economically significant as well. The assumptions of his model are fairly standard: a linear relationship exists between the dependent and independent variables, the independent variables are not random, and the expected value of the error term is zero. 

Smith is confident with the results presented in his report. He has already done some hypothesis testing for statistical significance, including calculating a t-statistic and conducting a two-tailed test where the null hypothesis is that the regression coefficient is equal to zero versus the alternative that it is not. He feels that he has done a thorough job on the report and is ready to answer any questions posed by his boss.

However, Smith’s boss, John Sutter, is concerned that in his analysis, Smith has ignored several potential problems with the regression model that may affect his conclusions. He knows that when any of the basic assumptions of a regression model are violated, any results drawn for the model are questionable. He asks Smith to go back and carefully examine the effects of heteroskedasticity, multicollinearity, and serial correlation on his model. In specific, he wants Smith to make suggestions regarding how to detect these errors and to correct problems that he encounters. 

Suppose that there is evidence that the residual terms in the regression are positively correlated. The most likely effect on the statistical inferences drawn from the regressions results is for Smith to commit a:

A)
Type I error by incorrectly rejecting the null hypotheses that the regression parameters are equal to zero.
B)
Type II error by incorrectly failing to reject the null hypothesis that the regression parameters are equal to zero.
C)
Type I error by incorrectly failing to reject the null hypothesis that the regression parameters are equal to zero.


One problem with positive autocorrelation (also known as positive serial correlation) is that the standard errors of the parameter estimates will be too small and the t-statistics too large. This may lead Smith to incorrectly reject the null hypothesis that the parameters are equal to zero. In other words, Smith will incorrectly conclude that the parameters are statistically significant when in fact they are not. This is an example of a Type I error: incorrectly rejecting the null hypothesis when it should not be rejected. (Study Session 3, LOS 12.i)


Sutter has detected the presence of conditional heteroskedasticity in Smith’s report. This is evidence that:

A)
the variance of the error term is correlated with the values of the independent variables.
B)
two or more of the independent variables are highly correlated with each other.
C)
the error terms are correlated with each other.


Conditional heteroskedasticity exists when the variance of the error term is correlated with the values of the independent variables.

Multicollinearity, on the other hand, occurs when two or more of the independent variables are highly correlated with each other. Serial correlation exists when the error terms are correlated with each other. (Study Session 3, LOS 12.i)


Suppose there is evidence that the variance of the error term is correlated with the values of the independent variables. The most likely effect on the statistical inferences Smith can make from the regressions results is to commit a:

A)
Type II error by incorrectly failing to reject the null hypothesis that the regression parameters are equal to zero.
B)
Type I error by incorrectly rejecting the null hypotheses that the regression parameters are equal to zero.
C)
Type I error by incorrectly failing to reject the null hypothesis that the regression parameters are equal to zero.


One problem with heteroskedasticity is that the standard errors of the parameter estimates will be too small and the t-statistics too large. This will lead Smith to incorrectly reject the null hypothesis that the parameters are equal to zero. In other words, Smith will incorrectly conclude that the parameters are statistically significant when in fact they are not. This is an example of a Type I error: incorrectly rejecting the null hypothesis when it should not be rejected. (Study Session 3, LOS 12.i)


Which of the following is most likely to indicate that two or more of the independent variables, or linear combinations of independent variables, may be highly correlated with each other? Unless otherwise noted, significant and insignificant mean significantly different from zero and not significantly different from zero, respectively.

A)
The R2 is low, the F-statistic is insignificant and the Durbin-Watson statistic is significant.
B)
The R2 is high, the F-statistic is significant and the t-statistics on the individual slope coefficients are insignificant.
C)
The R2 is high, the F-statistic is significant and the t-statistics on the individual slope coefficients are significant.


Multicollinearity occurs when two or more of the independent variables, or linear combinations of independent variables, may be highly correlated with each other. In a classic effect of multicollinearity, the R2 is high and the F-statistic is significant, but the t-statistics on the individual slope coefficients are insignificant. (Study Session 3, LOS 12.j)


Suppose there is evidence that two or more of the independent variables, or linear combinations of independent variables, may be highly correlated with each other. The most likely effect on the statistical inferences Smith can make from the regression results is to commit a:

A)
Type II error by incorrectly failing to reject the null hypothesis that the regression parameters are equal to zero.
B)
Type I error by incorrectly rejecting the null hypothesis that the regression parameters are equal to zero.
C)
Type I error by incorrectly failing to reject the null hypothesis that the regression parameters are equal to zero.


One problem with multicollinearity is that the standard errors of the parameter estimates will be too large and the t-statistics too small. This will lead Smith to incorrectly fail to reject the null hypothesis that the parameters are statistically insignificant. In other words, Smith will incorrectly conclude that the parameters are not statistically significant when in fact they are. This is an example of a Type II error: incorrectly failing to reject the null hypothesis when it should be rejected. (Study Session 3, LOS 12.j)


Using the Durbin-Watson test statistic, Smith rejects the null hypothesis suggested by the test. This is evidence that:

A)
two or more of the independent variables are highly correlated with each other.
B)
the error term is normally distributed.
C)
the error terms are correlated with each other.


Serial correlation (also called autocorrelation) exists when the error terms are correlated with each other.

Multicollinearity, on the other hand, occurs when two or more of the independent variables are highly correlated with each other. One assumption of multiple regression is that the error term is normally distributed. (Study Session 3, LOS 12.i)



作者: 土豆妮    时间: 2011-3-3 14:28

An analyst is estimating whether a fund’s excess return for a quarter is related to interest rates and last quarter’s excess return. The regression equation is found to have unconditional heteroskedasticity and serial correlation. Which of the following is most accurate? Parameter estimates will be:

A)
inaccurate and statistical inference about the parameters will not be valid.
B)
accurate but statistical inference about the parameters will not be valid.
C)
inaccurate but statistical inference about the parameters will be valid.


One of the independent variables is a lagged value of the dependent variable. This means that serial correlation will cause an inaccurate parameter estimate. Serial correlation always impacts the statistical inference about the parameters. Unconditional heteroskedasticity never impacts statistical inference or parameter accuracy.


作者: 土豆妮    时间: 2011-3-3 14:29

During the course of a multiple regression analysis, an analyst has observed several items that she believes may render incorrect conclusions. For example, the coefficient standard errors are too small, although the estimated coefficients are accurate. She believes that these small standard error terms will result in the computed t-statistics being too big, resulting in too many Type I errors. The analyst has most likely observed which of the following assumption violations in her regression analysis?

A)
Multicollinearity.
B)
Homoskedasticity.
C)
Positive serial correlation.


Positive serial correlation is the condition where a positive regression error in one time period increases the likelihood of having a positive regression error in the next time period. The residual terms are correlated with one another, leading to coefficient error terms that are too small.


作者: 土豆妮    时间: 2011-3-3 14:29

Alex Wade, CFA, is analyzing the result of a regression analysis comparing the performance of gold stocks versus a broad equity market index. Wade believes that serial correlation may be present, and in order to prove his theory, should use which of the following methods to detect its presence?

A)
The Breusch-Pagan test.
B)
The Durbin-Watson statistic.
C)
The Hansen method.


The Durbin-Watson statistic is the most commonly used method for the detection of serial correlation, although residual plots can also be utilized. For a large sample size, DW ≈ 2(1-r), where r is the correlation coefficient between residuals from one period and those from a previous period. The DW statistic is then compared to a table of DW statistics that gives upper and lower critical values for various sample sizes, levels of significance and numbers of degrees of freedom to detect the presence or absence of serial correlation.


作者: 土豆妮    时间: 2011-3-3 14:29

Which of the following statements regarding serial correlation that might be encountered in regression analysis is least accurate?

A)
Negative serial correlation causes a failure to reject the null hypothesis when it is actually false.
B)
Positive serial correlation typically has the same effect as heteroskedasticity.
C)
Serial correlation occurs least often with time series data.


Serial correlation, which is sometimes referred to as autocorrelation, occurs when the residual terms are correlated with one another, and is most frequently encountered with time series data.


作者: 土豆妮    时间: 2011-3-3 14:30

An analyst is estimating whether company sales is related to three economic variables. The regression exhibits conditional heteroskedasticity, serial correlation, and multicollinearity. The analyst uses Hansen’s procedure to adjust for the standard errors. Which of the following is most accurate? The:

A)
regression will still exhibit heteroskedasticity and multicollinearity, but the serial correlation problem will be solved.
B)
regression will still exhibit serial correlation and multicollinearity, but the heteroskedasticity problem will be solved.
C)
regression will still exhibit multicollinearity, but the heteroskedasticity and serial correlation problems will be solved.


The Hansen procedure simultaneously solves for heteroskedasticity and serial correlation.


作者: 土豆妮    时间: 2011-3-3 14:30

Which of the following is least likely a method of detecting serial correlations?

A)
The Durbin-Watson test.
B)
A scatter plot of the residuals over time.
C)
The Breusch-Pagan test.


The Breusch-Pagan test is a test of the heteroskedasticity and not of serial correlation.


作者: 土豆妮    时间: 2011-3-3 14:30

Which of the following is least accurate regarding the Durbin-Watson (DW) test statistic?

A)
If the residuals have negative serial correlation, the DW statistic will be greater than 2.
B)
In tests of serial correlation using the DW statistic, there is a rejection region, a region over which the test can fail to reject the null, and an inconclusive region.
C)
If the residuals have positive serial correlation, the DW statistic will be greater than 2.


A value of 2 indicates no correlation, a value greater than 2 indicates negative correlation, and a value less than 2 indicates a positive correlation. There is a range of values in which the DW test is inconclusive.


作者: 土豆妮    时间: 2011-3-3 14:30

An analyst is estimating whether a fund’s excess return for a month is dependent on interest rates and whether the S& 500 has increased or decreased during the month. The analyst collects 90 monthly return premia (the return on the fund minus the return on the S& 500 benchmark), 90 monthly interest rates, and 90 monthly S& 500 index returns from July 1999 to December 2006. After estimating the regression equation, the analyst finds that the correlation between the regressions residuals from one period and the residuals from the previous period is 0.199. Which of the following is most accurate at a 0.05 level of significance, based solely on the information provided? The analyst:

A)
cannot conclude that the regression exhibits either serial correlation or multicollinearity.
B)
can conclude that the regression exhibits multicollinearity, but cannot conclude that the regression exhibits serial correlation.
C)
can conclude that the regression exhibits serial correlation, but cannot conclude that the regression exhibits multicollinearity.


The Durbin-Watson statistic tests for serial correlation. For large samples, the Durbin-Watson statistic is approximately equal to two multiplied by the difference between one and the sample correlation between the regressions residuals from one period and the residuals from the previous period, which is 2 × (1 ? 0.199) = 1.602, which is less than the lower Durbin-Watson value (with 2 variables and 90 observations) of 1.61. That means the hypothesis of no serial correlation is rejected. There is no information on whether the regression exhibits multicollinearity.


作者: 土豆妮    时间: 2011-3-3 14:30

An analyst is estimating whether a fund’s excess return for a month is dependent on interest rates and whether the S& 500 has increased or decreased during the month. The analyst collects 90 monthly return premia (the return on the fund minus the return on the S& 500 benchmark), 90 monthly interest rates, and 90 monthly S& 500 index returns from July 1999 to December 2006. After estimating the regression equation, the analyst finds that the correlation between the regressions residuals from one period and the residuals from the previous period is 0.145. Which of the following is most accurate at a 0.05 level of significance, based solely on the information provided? The analyst:

A)
cannot conclude that the regression exhibits either serial correlation or heteroskedasticity.
B)
can conclude that the regression exhibits serial correlation, but cannot conclude that the regression exhibits heteroskedasticity.
C)
can conclude that the regression exhibits heteroskedasticity, but cannot conclude that the regression exhibits serial correlation.


The Durbin-Watson statistic tests for serial correlation. For large samples, the Durbin-Watson statistic is equal to two multiplied by the difference between one and the sample correlation between the regressions residuals from one period and the residuals from the previous period, which is 2 × (1 ? 0.145) = 1.71, which is higher than the upper Durbin-Watson value (with 2 variables and 90 observations) of 1.70. That means the hypothesis of no serial correlation cannot be rejected. There is no information on whether the regression exhibits heteroskedasticity.


作者: 土豆妮    时间: 2011-3-3 14:31

John Rains, CFA, is a professor of finance at a large university located in the Eastern United States. He is actively involved with his local chapter of the Society of Financial Analysts. Recently, he was asked to teach one session of a Society-sponsored CFA review course, specifically teaching the class addressing the topic of quantitative analysis. Based upon his familiarity with the CFA exam, he decides that the first part of the session should be a review of the basic elements of quantitative analysis, such as hypothesis testing, regression and multiple regression analysis. He would like to devote the second half of the review session to the practical application of the topics he covered in the first half.

Rains decides to construct a sample regression analysis case study for his students in order to demonstrate a “real-life” application of the concepts. He begins by compiling financial information on a fictitious company called Big Rig, Inc. According to the case study, Big Rig is the primary producer of the equipment used in the exploration for and drilling of new oil and gas wells in the United States. Rains has based the information in the problem on an actual equity holding in his personal portfolio, but has simplified the data for the purposes of the review course.

Rains constructs a basic regression model for Big Rig in order to estimate its profitability (in millions), using two independent variables: the number of new wells drilled in the U.S. (WLS) and the number of new competitors (COMP) entering the market:

Profits = b0 + b1WLS – b2COMP + ε

Based on the model, the estimated regression equation is:

Profits = 22.5 + 0.98(WLS) ? 0.35(COMP)

Using the past 5 years of quarterly data, he calculated the following regression estimates for Big Rig, Inc:

Coefficient

Standard Error

Intercept

22.5

2.465

WLS

0.98

0.683

COMP

0.35

0.186

Using the information presented, the t-statistic for the number of new competitors (COMP) coefficient is:

A)
1.882.
B)
1.435.
C)
9.128.


To test whether a coefficient is statistically significant, the null hypothesis is that the slope coefficient is zero. The t-statistic for the COMP coefficient is calculated as follows:

(0.35 – 0.0) / 0.186 = 1.882

(Study Session 3, LOS 11.g)


Rains asks his students to test the null hypothesis that states for every new well drilled, profits will be increased by the given multiple of the coefficient, all other factors remaining constant. The appropriate hypotheses for this two-tailed test can best be stated as:

A)
H0: b1 ≤ 0.98 versus Ha: b1 > 0.98.
B)
H0: b1 = 0.98 versus Ha: b1 ≠ 0.98.
C)
H0: b1 = 0.35 versus Ha: b1 ≠ 0.35.


The coefficient given in the above table for the number of new wells drilled (WLS) is 0.98. The hypothesis should test to see whether the coefficient is indeed equal to 0.98 or is equal to some other value. Note that hypotheses with the “greater than” or “less than” symbol are used with one-tailed tests. (Study Session 3, LOS 11.g)


Continuing with the analysis of Big Rig, Rains asks his students to calculate the mean squared error(MSE). Assume that the sum of squared errors (SSE) for the regression model is 359.

A)
21.118.
B)
17.956.
C)
18.896.


The MSE is calculated as SSE / (n – k – 1). Recall that there are twenty observations and two independent variables. Therefore, the SEE in this instance = 359 / (20 – 2 ? 1) = 21.118. (Study Session 3, LOS 11.i)


Rains now wants to test the students’ knowledge of the use of the F-test and the interpretation of the F-statistic. Which of the following statements regarding the F-test and the F-statistic is the most correct?

A)
The F-test is usually formulated as a two-tailed test.
B)
The F-statistic is almost always formulated to test each independent variable separately, in order to identify which variable is the most statistically significant.
C)
The F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.


An F-test assesses how well a set of impendent variables, as a group, explains the variation in the dependent variable. It tests all independent variables as a group, and is always a one-tailed test. The decision rule is to reject the null hypothesis if the calculated F-value is greater than the critical F-value. (Study Session 3, LOS 11.i)


One of the main assumptions of a multiple regression model is that the variance of the residuals is constant across all observations in the sample. A violation of the assumption is known as:

A)
robust standard errors.
B)
positive serial correlation.
C)
heteroskedasticity.


Heteroskedasticity is present when the variance of the residuals is not the same across all observations in the sample, and there are sub-samples that are more spread out than the rest of the sample. (Study Session 3, LOS 12.i)


Rains reminds his students that a common condition that can distort the results of a regression analysis is referred to as serial correlation. The presence of serial correlation can be detected through the use of:

A)
the Breusch-Pagen test.
B)
the Durbin-Watson statistic.
C)
the Hansen method.


The Durbin-Watson test (DW ≈ 2(1 ? r)) can detect serial correlation. Another commonly used method is to visually inspect a scatter plot of residuals over time. The Hansen method does not detect serial correlation, but can be used to remedy the situation. Note that the Breusch-Pagen test is used to detect heteroskedasticity. (Study Session 3, LOS 12.i)



作者: bun789    时间: 2012-6-1 12:09

Than you ^^




欢迎光临 CFA论坛 (http://forum.theanalystspace.com/) Powered by Discuz! 7.2