CFA论坛 - Powered by Discuz! Board

标题: 12: Multiple Regression and Issues in Regression Ana [打印本页]

作者: 土豆妮 时间: 2010-4-8 13:58 标题: [2010]Session 3:-Reading 12: Multiple Regression and Issues in Regression Ana

Session 3: Quantitative Methods: Quantitative
Methods for Valuation
Reading 12: Multiple Regression and Issues in Regression Analysis

LOS a, (Part 1): Formulate a multiple regression equation to describe the relation between a dependent variable and several independent variables, and determine the statistical significance of each independent variable.

Which of the following statements regarding the results of a regression analysis is FALSE? The:

A)

slope coefficient in a multiple regression is the change in the dependent variable for a one-unit change in the independent variable, holding all other variables constant.

B)

slope coefficient in a multiple regression is the value of the dependent variable for a given value of the independent variable.

C)

slope coefficients in the multiple regression are referred to as partial betas.

作者: 土豆妮 时间: 2010-4-8 13:58

Which of the following statements regarding the results of a regression analysis is FALSE? The:

The slope coefficient is the change in the dependent variable for a one-unit change in the independent variable.

作者: 土豆妮 时间: 2010-4-8 14:00

William Brent, CFA, is the chief financial officer for Mega Flowers, one of the largest producers of flowers and bedding plants in the Western United States. Mega Flowers grows its plants in three large nursery facilities located in California. Its products are sold in its company-owned retail nurseries as well as in large, home and garden “super centers”. For its retail stores, Mega Flowers has designed and implemented marketing plans each season that are aimed at its consumers in order to generate additional sales for certain high-margin products. To fully implement the marketing plan, additional contract salespeople are seasonally employed.

For the past several years, these marketing plans seemed to be successful, providing a significant boost in sales to those specific products highlighted by the marketing efforts. However, for the past year, revenues have been flat, even though marketing expenditures increased slightly. Brent is concerned that the expensive seasonal marketing campaigns are simply no longer generating the desired returns, and should either be significantly modified or eliminated altogether. He proposes that the company hire additional, permanent salespeople to focus on selling Mega Flowers’ high-margin products all year long. The chief operating officer, David Johnson, disagrees with Brent. He believes that although last year’s results were disappointing, the marketing campaign has demonstrated impressive results for the past five years, and should be continued. His belief is that the prior years’ performance can be used as a gauge for future results, and that a simple increase in the sales force will not bring about the desired results.

Brent gathers information regarding quarterly sales revenue and marketing expenditures for the past five years. Based upon historical data, Brent derives the following regression equation for Mega Flowers (stated in millions of dollars):

Expected Sales = 12.6 + 1.6 (Marketing Expenditures) + 1.2 (# of Salespeople)

Brent shows the equation to Johnson and tells him, “This equation shows that a $1 million increase in marketing expenditures will increase the independent variable by $1.6 million, all other factors being equal.” Johnson replies, “It also appears that sales will equal $12.6 million if all independent variables are equal to zero.”

In regard to their conversation about the regression equation:

A)
Brent’s statement is correct; Johnson’s statement is correct.

B)
Brent’s statement is incorrect; Johnson’s statement is correct.

C)
Brent’s statement is correct; Johnson’s statement is incorrect.

Expected sales is the dependent variable in the equation, while expenditures for marketing and salespeople are the independent variables. Therefore, a $1 million increase in marketing expenditures will increase the dependent variable (expected sales) by $1.6 million. Brent’s statement is incorrect.

Johnson’s statement is correct. 12.6 is the intercept in the equation, which means that if all independent variables are equal to zero, expected sales will be $12.6 million. (Study Session 3, LOS 12.a)

Using data from the past 20 quarters, Brent calculates the t-statistic for marketing expenditures to be 3.68 and the t-statistic for salespeople at 2.19. At a 5% significance level, the two-tailed critical values are t_c = +/- 2.127. This most likely indicates that:

A)
the t-statistic has 18 degrees of freedom.

B)
both independent variables are statistically significant.

C)
the null hypothesis should not be rejected.

Using a 5% significance level with degrees of freedom (df) of 17 (20-2-1), both independent variables are significant and contribute to the level of expected sales. (Study Session 3, LOS 12.a)

Brent calculated that the sum of squared errors (SSE) for the variables is 267. The mean squared error (MSE) would be:

A)
14.831.

B)
15.706.

C)
14.055.

The MSE is calculated as SSE / (n – k – 1). Recall that there are twenty observations and two independent variables. Therefore, the MSE in this instance [267 / (20 – 2 - 1)] = 15.706. (Study Session 3, LOS 11.i)

Brent is trying to explain the concept of the standard error of estimate (SEE) to Johnson. In his explanation, Brent makes three points about the SEE:

Point 1: The SEE is the standard deviation of the differences between the estimated values for the independent variables and the actual observations for the independent variable.
Point 2: Any violation of the basic assumptions of a multiple regression model is going to affect the SEE.
Point 3: If there is a strong relationship between the variables and the SSE is small, the individual estimation errors will also be small.

How many of Brent’s points are most accurate?

A)
1 of Brent’s points are correct.

B)
All 3 of Brent’s points are correct.

C)
2 of Brent’s points are correct.

The statements that if there is a strong relationship between the variables and the SSE is small, the individual estimation errors will also be small, and also that any violation of the basic assumptions of a multiple regression model is going to affect the SEE are both correct.

The SEE is the standard deviation of the differences between the estimated values for the dependent variables (not independent) and the actual observations for the dependent variable. Brent’s Point 1 is incorrect.

Therefore, 2 of Brent’s points are correct. (Study Session 3, LOS 11.f)

Assuming that next year’s marketing expenditures are $3,500,000 and there are five salespeople, predicted sales for Mega Flowers will be:

A)
$11,600,000.

B)
$2,400,000.

C)
$24,200,000.

Using the regression equation from above, expected sales equals 12.6 + (1.6 x 3.5) + (1.2 x 5) = $24.2 million. Remember to check the details – i.e. this equation is denominated in millions of dollars. (Study Session 3, LOS 12.c)

Brent would like to further investigate whether at least one of the independent variables can explain a significant portion of the variation of the dependent variable. Which of the following methods would be best for Brent to use?

A)
The multiple coefficient of determination.

B)
The F-statistic.

C)
An ANOVA table.

To determine whether at least one of the coefficients is statistically significant, the calculated F-statistic is compared with the critical F-value at the appropriate level of significance. (Study Session 3, LOS 12.e)

作者: 土豆妮 时间: 2010-4-8 14:01

Consider the following estimated regression equation, with the standard errors of the slope coefficients as noted:

Sales_i = 10.0 + 1.25 R&D_i + 1.0 ADV_i – 2.0 COMP_i + 8.0 CAP_i

where the standard error for the estimated coefficient on R&D is 0.45, the standard error for the estimated coefficient on ADV is 2.2 , the standard error for the estimated coefficient on COMP is 0.63, and the standard error for the estimated coefficient on CAP is 2.5.

The equation was estimated over 40 companies. Using a 5% level of significance, which of the estimated coefficients are significantly different from zero?

A)

R&D, ADV, COMP, and CAP.

B)

ADV and CAP only.

C)

R&D, COMP, and CAP only.

作者: 土豆妮 时间: 2010-4-8 14:02

Consider the following estimated regression equation, with the standard errors of the slope coefficients as noted:

Sales_i = 10.0 + 1.25 R&D_i + 1.0 ADV_i – 2.0 COMP_i + 8.0 CAP_i

where the standard error for the estimated coefficient on R&D is 0.45, the standard error for the estimated coefficient on ADV is 2.2 , the standard error for the estimated coefficient on COMP is 0.63, and the standard error for the estimated coefficient on CAP is 2.5.

The equation was estimated over 40 companies. Using a 5% level of significance, which of the estimated coefficients are significantly different from zero?

A)

R&D, ADV, COMP, and CAP.

B)

ADV and CAP only.

C)

R&D, COMP, and CAP only.

The critical t-values for 40-4-1 = 35 degrees of freedom and a 5% level of significance are ± 2.03.

The calculated t-values are:
t for R&D = 1.25 / 0.45 = 2.777
t for ADV = 1.0/ 2.2 = 0.455
t for COMP = -2.0 / 0.63 = -3.175
t for CAP = 8.0 / 2.5 = 3.2
Therefore, R&D, COMP, and CAP are statistically significant.

作者: 土豆妮 时间: 2010-4-8 14:02

Consider the following regression equation:

Sales_i = 10.0 + 1.25 R&D_i+ 1.0 ADV_i – 2.0 COMP_i + 8.0 CAP_iwhere Sales is dollar sales in millions, R&D is research and development expenditures in millions, ADV is dollar amount spent on advertising in millions, COMP is the number of competitors in the industry, and CAP is the capital expenditures for the period in millions of dollars.

Which of the following is NOT a correct interpretation of this regression information?

A)

If a company spends $1 million more on capital expenditures (holding everything else constant), Sales are expected to increase by $8.0 million.

B)

If R&D and advertising expenditures are $1 million each, there are 5 competitors, and capital expenditures are $2 million, expected Sales are $8.25 million.

C)

One more competitor will mean $2 million less in Sales (holding everything else constant).

作者: 土豆妮 时间: 2010-4-8 14:09

Consider the following regression equation:

Sales_i = 10.0 + 1.25 R&D_i+ 1.0 ADV_i – 2.0 COMP_i + 8.0 CAP_iwhere Sales is dollar sales in millions, R&D is research and development expenditures in millions, ADV is dollar amount spent on advertising in millions, COMP is the number of competitors in the industry, and CAP is the capital expenditures for the period in millions of dollars.

Which of the following is NOT a correct interpretation of this regression information?

Predicted sales = $10 + 1.25 + 1 – 10 + 16 = $18.25 million.

作者: 土豆妮 时间: 2010-4-8 14:10

Consider the following regression equation:

Sales_i= 20.5 + 1.5 R&D_i + 2.5 ADV_i – 3.0 COMP_i

where Sales is dollar sales in millions, R&D is research and development expenditures in millions, ADV is dollar amount spent on advertising in millions, and COMP is the number of competitors in the industry.

Which of the following is NOT a correct interpretation of this regression information?

A)

One more competitor will mean $3 million less in sales (holding everything else constant).

B)

If a company spends $1 more on R&D (holding everything else constant), sales are expected to increase by $1.5 million.

C)

If R&D and advertising expenditures are $1 million each and there are 5 competitors, expected sales are $9.5 million.

作者: 土豆妮 时间: 2010-4-8 14:10

Consider the following regression equation:

Sales_i= 20.5 + 1.5 R&D_i + 2.5 ADV_i – 3.0 COMP_i

where Sales is dollar sales in millions, R&D is research and development expenditures in millions, ADV is dollar amount spent on advertising in millions, and COMP is the number of competitors in the industry.

Which of the following is NOT a correct interpretation of this regression information?

If a company spends $1 million more on R&D (holding everything else constant), sales are expected to increase by $1.5 million. Always be aware of the units of measure for the different variables.

作者: 土豆妮 时间: 2010-4-8 14:11

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years. Which of the follow regression equations correctly represents Hilton’s hypothesis?

A)
SALES = α x β₁ POP x β₂ INCOME x β₃ ADV x ε.

B)
SALES = α + β₁ POP + β₂ INCOME + β₃ ADV + ε.

C)
INCOME = α + β₁ POP + β₂ SALES + β₃ ADV + ε.

作者: 土豆妮 时间: 2010-4-8 14:11

A)
SALES = α x β₁ POP x β₂ INCOME x β₃ ADV x ε.

B)
SALES = α + β₁ POP + β₂ INCOME + β₃ ADV + ε.

C)
INCOME = α + β₁ POP + β₂ SALES + β₃ ADV + ε.

SALES is the dependent variable. POP, INCOME, and ADV should be the independent variables (on the right hand side) of the equation (in any order). Regression equations are additive.

作者: 土豆妮 时间: 2010-4-8 14:11

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years and estimates the following equation (standard errors in parentheses):

SALES = α + 0.004 POP + 1.031 INCOME + 2.002 ADV
(0.005)	(0.337)	(2.312)

The critical t-statistic for a 95% confidence level is 2.120. Which of the independent variables is statistically different from zero at the 95% confidence level?

A)
ADV only.

B)
INCOME only.

C)
INCOME and ADV.

作者: 土豆妮 时间: 2010-4-8 14:11

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years and estimates the following equation (standard errors in parentheses):

SALES = α + 0.004 POP + 1.031 INCOME + 2.002 ADV
(0.005)	(0.337)	(2.312)

The critical t-statistic for a 95% confidence level is 2.120. Which of the independent variables is statistically different from zero at the 95% confidence level?

A)
ADV only.

B)
INCOME only.

C)
INCOME and ADV.

The calculated test statistic is coefficient/standard error. Hence, the t-stats are 0.8 for POP, 3.059 for INCOME, and 0.866 for ADV. Since the t-stat for INCOME is the only one greater than the critical t-value of 2.120, only INCOME is significantly different from zero.

作者: 土豆妮 时间: 2010-4-8 14:12

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years and estimates the following equation (standard errors in parentheses):

SALES = 0.000 + 0.004 POP + 1.031 INCOME + 2.002 ADV
(0.113)	(0.005)	(0.337)	(2.312)

For next year, Hilton estimates the following parameters: (1) the population under 20 will be 120 million, (2) disposable income will be $300,000,000, and (3) advertising expenditures will be $100,000,000. Based on these estimates and the regression equation, what are predicted sales for the industry for next year?

A)
$557,143,000.

B)
$656,991,000.

C)
$509,980,000.

作者: 土豆妮 时间: 2010-4-8 14:12

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years and estimates the following equation (standard errors in parentheses):

SALES = 0.000 + 0.004 POP + 1.031 INCOME + 2.002 ADV
(0.113)	(0.005)	(0.337)	(2.312)

A)
$557,143,000.

B)
$656,991,000.

C)
$509,980,000.

Predicted sales for next year are:

SALES = α + 0.004 (120) + 1.031 (300) + 2.002 (100) = 509,980,000.

作者: 土豆妮 时间: 2010-4-8 14:14

In a recent analysis of salaries (in $1,000) of financial analysts, a regression of salaries on education, experience, and gender is run. Gender equals one for men and zero for women. The regression results from a sample of 230 financial analysts are presented below, with t-statistics in parenthesis.

Salaries = 34.98 + 1.2 Education + 0.5 Experience + 6.3 Gender

(29.11) (8.93) (2.98) (1.58)

What is the expected salary (in $1,000) of a woman with 16 years of education and 10 years of experience?

A)
59.18.

B)
54.98.

C)
65.48.

34.98 + 1.2(16) + 0.5(10) = 59.18

Holding everything else constant, do men get paid more than women? Use a 5% level of significance. No, since the t-value:

A)
does not exceed the critical value of 1.96.

B)
does not exceed the critical value of 1.65.

C)
exceeds the critical value of 1.96.

H₀: b_gender ≤ 0
H_a: b_gender> 0

t-value of 1.58 < 1.65 (critical value)

作者: 土豆妮 时间: 2010-4-8 14:14

Werner Baltz, CFA, has regressed 30 years of data to forecast future sales for National Motor Company based on the percent change in gross domestic product (GDP) and the change in price of a U.S. gallon of fuel at retail. The results are presented below. Note: results must be multiplied by $1,000,000:

Coefficient Estimates

		Standard Error
Predictor	Coefficient	of the Coefficient
Intercept	78	13.710
?1 GDP	30.22	12.120
?2$ Fuel	?412.39	183.981

Analysis of Variance Table (ANOVA)

Source

Degrees of Freedom

Sum of Squares

Mean Square

Regression

291.30

145.65

Error

27

132.12

Total

29

423.42

In 2002, if GDP rises 2.2% and the price of fuels falls $0.15, Baltz’s model will predict Company sales in 2002 to be (in $ millions) closest to:

A)
$206.

B)
$128.

C)
$82.

Sales will be closest to $78 + ($30.22 × 2.2) + [(?412.39) × (?$0.15)] = $206.34 million.

Baltz proceeds to test the hypothesis that none of the independent variables has significant explanatory power. He concludes that, at a 5% level of significance:

A)
at least one of the independent variables has explanatory power, because the calculated F-statistic exceeds its critical value.

B)
none of the independent variables has explanatory power, because the calculated F-statistic does not exceed its critical value.

C)
all of the independent variables have explanatory power, because the calculated F-statistic exceeds its critical value.

From the ANOVA table, the calculated F-statistic is (mean square regression / mean square error) = 145.65 / 4.89 = 29.7853. From the F distribution table (2 df numerator, 27 df denominator) the F-critical value may be interpolated to be 3.36. Because 29.7853 is greater than 3.36, Baltz rejects the null hypothesis and concludes that at least one of the independent variables has explanatory power.

Baltz then tests the individual variables, at a 5% level of significance, to determine whether sales are explained by individual changes in GDP and fuel prices. Baltz concludes that:

A)
neither GDP nor fuel price changes explain changes in sales.

B)
both GDP and fuel price changes explain changes in sales.

C)
only GDP changes explain changes in sales.

From the ANOVA table, the calculated t-statistics are (30.22 / 12.12) = 2.49 for GDP and (?412.39 / 183.981) = ?2.24 for fuel prices. These values are both outside the t-critical value at 27 degrees of freedom of ±2.052. Therefore, Baltz is able to reject the null hypothesis that these coefficients are equal to zero, and concludes that each variable is important in explaining sales.

作者: 土豆妮 时间: 2010-4-8 14:17

Milky Way, Inc. is a large manufacturer of children’s toys and games based in the United States. Their products have high name brand recognition, and have been sold in retail outlets throughout the United States for nearly fifty years. The founding management team was bought out by a group of investors five years ago. The new management team, led by Russell Stepp, decided that Milky Way should try to expand its sales into the Western European market, which had never been tapped by the former owners. Under Stepp’s leadership, additional personnel are hired in the Research and Development department, and a new marketing plan specific to the European market is implemented. Being a new player in the European market, Stepp knows that it will take several years for Milky Way to establish its brand name in the marketplace, and is willing to make the expenditures now in exchange for increased future profitability.

Now, five years after entering the European market, Stepp is reviewing the results of his plan. Sales in Europe have slowly but steadily increased over since Milky Way’s entrance into the market, but profitability seems to have leveled out. Stepp decides to hire a consultant, Ann Hays, CFA, to review and evaluate their European strategy. One of Hays’ first tasks on the job is to perform a regression analysis on Milky Way’s European sales. She is seeking to determine whether the additional expenditures on research and development and marketing for the European market should be continued in the future.

Hays begins by establishing a relationship between the European sales of Milky Way (in millions of dollars) and the two independent variables, the number of dollars (in millions) spent on research and development (R&D) and marketing (MKTG). Based upon five years of monthly data, Hays constructs the following estimated regression equation:

Estimated Sales = 54.82 + 5.97 (MKTG) + 1.45 (R&D)

Additionally, Hays calculates the following regression estimates:

Coefficient

Standard Error

Intercept

54.82

3.165

MKTG

5.97

1.825

R&D

1.45

0.987

Hays begins the analysis by determining if both of the independent variables are statistically significant. To test whether a coefficient is statistically significant means to test whether it is statistically significantly different from:

A)
the upper tail critical value.

B)
zero.

C)
slope coefficient.

The magnitude of the coefficient reveals nothing about the importance of the independent variable in explaining the dependent variable. Therefore, it must be determined if each independent variable is statistically significant. The null hypothesis is that the slope coefficient for each independent variable equals zero. (Study Session 3, LOS 11.a)

The t-statistic for the marketing variable is calculated to be:

A)
3.271.

B)
17.321.

C)
1.886.

The t-statistic for the marketing coefficient is calculated as follows: (5.97– 0.0) / 1.825 = 3.271. (Study Session 3, LOS 11.g)

Hays formulates a test structure where the decision rule is to reject the null hypothesis if the calculated test statistic is either larger than the upper tail critical value or lower than the lower tail critical value. At a 5% significance level with 57 degrees of freedom, assume that the two-tailed critical t-values are t_c = ±2.004. Based on this information, Hays makes the following conclusions:

Point 1: The intercept term is statistically significant.
Point 2: Both independent variables contribute to explaining states for Milky Way, Inc.
Point 3: If an F-test were being used, the null hypothesis would be rejected.

Which of Hays’ conclusions are CORRECT?

A)
Points 1 and 2.

B)
Points 1 and 3.

C)
Points 2 and 3.

Hays’ Point 1 is correct. The t-statistic for the intercept term is (54.82 – 0) / 3.165 = 17.32, which is greater than the critical value of 2.004, so we can conclude that the intercept term is statistically significant.

Hays’ Point 2 is incorrect. The t-statistic for the R&D term is (1.45 – 0) / 0.987 = 1.469, which is not greater than the critical value of 2.004. This means that only MKTG can be said to contribute to explaining sales for Milky Way, Inc.

Hays’ Point 3 is correct. An F-test tests whether at least one of the independent variables is significantly different from zero, where the null hypothesis is that all none of the independent variables are significant. Since we know that MKTG is a significant variable (t-statistic of 3.271), we can reject the hypothesis that none of the variables are significant. (Study Session 3, LOS 11.i)

Hays is aware that part, but not all, of the total variation in expected sales can be explained by the regression equation. Which of the following statements correctly reflects this relationship?

A)
SST = RSS + SSE.

B)
MSE = RSS + SSE.

C)
SST = RSS + SSE + MSE.

RSS (Regression sum of squares) is the portion of the total variation in Y that is explained by the regression equation. The SSE (Sum of squared errors), is the portion of the total variation in Y that is not explained by the regression. The SST is the total variation of Y around its average value. Therefore, SST = RSS + SSE. These sums of squares will always be calculated for you on the exam, so focus on understanding the interpretation of each. (Study Session 3, LOS 11.i)

Hays decides to test the overall effectiveness of the both independent variables in explaining sales for Milky Way. Assuming that the total sum of squares is 389.14, the sum of squared errors is 146.85 and the mean squared error is 2.576, calculate and interpret the R².

A)
The R² equals 0.623, indicating that the two independent variables account for 62.3% of the variation in monthly sales.

B)
The R² equals 0.242, indicating that the two independent variables account for 24.2% of the variation in monthly sales.

C)
The R² equals 0.623, indicating that the two independent variables account for 37.7% of the variation in monthly sales.

The R² is calculated as (SST – SSE) / SST. In this example, R² equals (389.14–146.85) / 389.14 = .623 or 62.3%. This indicates that the two independent variables together explain 62.3% of the variation in monthly sales. The value for mean squared error is not used in this calculation. (Study Session 3, LOS 11.i)

Stepp is concerned about the validity of Hays’ regression analysis and asks Hays if he can test for the presence of heteroskedasticity. Hays complies with Stepp’s request, and detects the presence of unconditional heteroskedasticity. Which of the following statements regarding heteroskedasticity is most correct?

A)
Unconditional heteroskedasticity usually causes no major problems with the regression.

B)
Heteroskedasticity can be detected either by examining scatter plots of the residual or by using the Durbin-Watson test.

C)
Unconditional heteroskedasticity does create significant problems for statistical inference.

Unconditional heteroskedasticity occurs when the heteroskedasticity is not related to the level of the independent variables. This means that it does not systematically increase or decrease with changes in the independent variable(s). Note that heteroskedasticity occurs when the variance of the residuals is different across all observations in the sample and can be detected either by examining scatter plots or using a Breusch-Pagen test. (Study Session 3, LOS 12.g)

作者: 土豆妮 时间: 2010-4-8 14:18

John Rains, CFA, is a professor of finance at a large university located in the Eastern United States. He is actively involved with his local chapter of the Society of Financial Analysts. Recently, he was asked to teach one session of a Society-sponsored CFA review course, specifically teaching the class addressing the topic of quantitative analysis. Based upon his familiarity with the CFA exam, he decides that the first part of the session should be a review of the basic elements of quantitative analysis, such as hypothesis testing, regression and multiple regression analysis. He would like to devote the second half of the review session to the practical application of the topics he covered in the first half.

Rains decides to construct a sample regression analysis case study for his students in order to demonstrate a “real-life” application of the concepts. He begins by compiling financial information on a fictitious company called Big Rig, Inc. According to the case study, Big Rig is the primary producer of the equipment used in the exploration for and drilling of new oil and gas wells in the United States. Rains has based the information in the problem on an actual equity holding in his personal portfolio, but has simplified the data for the purposes of the review course.

Rains constructs a basic regression model for Big Rig in order to estimate its profitability (in millions), using two independent variables: the number of new wells drilled in the U.S. (WLS) and the number of new competitors (COMP) entering the market:

Profits = b₀ + b₁WLS – b₂COMP + ε

Based on the model, the estimated regression equation is:

Profits = 22.5 + 0.98(WLS) ? 0.35(COMP)

Using the past 5 years of quarterly data, he calculated the following regression estimates for Big Rig, Inc:

Coefficient

Standard Error

Intercept

22.5

2.465

WLS

0.98

0.683

COMP

0.35

0.186

Using the information presented, the t-statistic for the number of new competitors (COMP) coefficient is:

A)
1.435.

B)
9.128.

C)
1.882.

To test whether a coefficient is statistically significant, the null hypothesis is that the slope coefficient is zero. The t-statistic for the COMP coefficient is calculated as follows:

(0.35 – 0.0) / 0.186 = 1.882

(Study Session 3, LOS 11.g)

Rains asks his students to test the null hypothesis that states for every new well drilled, profits will be increased by the given multiple of the coefficient, all other factors remaining constant. The appropriate hypotheses for this two-tailed test can best be stated as:

A)
H₀: b₁ ≤ 0.98 versus H_a: b₁ > 0.98.

B)
H₀: b₁ = 0.98 versus H_a: b₁ ≠ 0.98.

C)
H₀: b₁ = 0.35 versus H_a: b₁ ≠ 0.35.

The coefficient given in the above table for the number of new wells drilled (WLS) is 0.98. The hypothesis should test to see whether the coefficient is indeed equal to 0.98 or is equal to some other value. Note that hypotheses with the “greater than” or “less than” symbol are used with one-tailed tests. (Study Session 3, LOS 11.g)

Continuing with the analysis of Big Rig, Rains asks his students to calculate the mean squared error(MSE). Assume that the sum of squared errors (SSE) for the regression model is 359.

A)
17.956.

B)
18.896.

C)
21.118.

The MSE is calculated as SSE / (n – k – 1). Recall that there are twenty observations and two independent variables. Therefore, the SEE in this instance = 359 / (20 – 2 ? 1) = 21.118. (Study Session 3, LOS 11.i)

Rains now wants to test the students’ knowledge of the use of the F-test and the interpretation of the F-statistic. Which of the following statements regarding the F-test and the F-statistic is the most correct?

A)
The F-test is usually formulated as a two-tailed test.

B)
The F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.

C)
The F-statistic is almost always formulated to test each independent variable separately, in order to identify which variable is the most statistically significant.

An F-test assesses how well a set of impendent variables, as a group, explains the variation in the dependent variable. It tests all independent variables as a group, and is always a one-tailed test. The decision rule is to reject the null hypothesis if the calculated F-value is greater than the critical F-value. (Study Session 3, LOS 11.i)

One of the main assumptions of a multiple regression model is that the variance of the residuals is constant across all observations in the sample. A violation of the assumption is known as:

A)
robust standard errors.

B)
heteroskedasticity.

C)
positive serial correlation.

Heteroskedasticity is present when the variance of the residuals is not the same across all observations in the sample, and there are sub-samples that are more spread out than the rest of the sample. (Study Session 3, LOS 12.g)

Rains reminds his students that a common condition that can distort the results of a regression analysis is referred to as serial correlation. The presence of serial correlation can be detected through the use of:

A)
the Breusch-Pagen test.

B)
the Hansen method.

C)
the Durbin-Watson statistic.

The Durbin-Watson test (DW ≈ 2(1 ? r)) can detect serial correlation. Another commonly used method is to visually inspect a scatter plot of residuals over time. The Hansen method does not detect serial correlation, but can be used to remedy the situation. Note that the Breusch-Pagen test is used to detect heteroskedasticity. (Study Session 3, LOS 12.g)

作者: 土豆妮 时间: 2010-4-8 14:21

Based on her regression results in Exhibit 2, using a 5% level of significance, Smith should conclude that:

A)
stimulus packages have significant effects on foreclosure percentages, but housing crises do not have significant effects on foreclosure percentages.

B)
stimulus packages do not have significant effects on foreclosure percentages, but housing crises do have significant effects on foreclosure percentages.

C)
both stimulus packages and housing crises have significant effects on foreclosure percentages.

The appropriate test statistic for tests of significance on individual slope coefficient estimates is the t-statistic, which is provided in Exhibit 2 for each regression coefficient estimate. The reported t-statistic equals -2.10 for the STIM slope estimate and equals 2.35 for the CRISIS slope estimate. The critical t-statistic for the 5% significance level equals 2.12 (16 degrees of freedom, 5% level of significance).

Therefore, the slope estimate for STIM is not statistically significant (the reported t-statistic, -2.10, is not large enough). In contrast, the slope estimate for CRISIS is statistically significant (the reported t-statistic, 2.35, exceeds the 5% significance level critical value). (Study Session 3, LOS 12.a)

The standard error of estimate for Smith’s regression is closest to:

A)
0.53

B)
0.56

C)
0.16

The formula for the Standard Error of the Estimate (SEE) is:

The SEE equals the standard deviation of the regression residuals. A low SEE implies a high R². (Study Session 3, LOS 12.e)

Is Smith correct or incorrect regarding Concerns 1 and 2?

A)
Only correct on one concern and incorrect on the other.

B)
Incorrect on both Concerns.

C)
Correct on both Concerns.

Smith’s Concern 1 is incorrect. Heteroskedasticity is a violation of a regression assumption, and refers to regression error variance that is not constant over all observations in the regression. Conditional heteroskedasticity is a case in which the error variance is related to the magnitudes of the independent variables (the error variance is “conditional” on the independent variables). The consequence of conditional heteroskedasticity is that the standard errors will be too low, which, in turn, causes the t-statistics to be too high. Smith’s Concern 2 also is not correct. Multicollinearity refers to independent variables that are correlated with each other. Multicollinearity causes standard errors for the regression coefficients to be too high, which, in turn, causes the t-statistics to be too low. However, contrary to Smith’s concern, multicollinearity has no effect on the F-statistic. (Study Session 3, LOS 12.g)

The most recent change in foreclosure share was +1 percent. Smith decides to base her analysis on the data and methods provided in Exhibits 4 and 5, and determines that the two-step ahead forecast for the change in foreclosure share (in percent) is 0.125, and that the mean reverting value for the change in foreclosure share (in percent) is 0.071. Is Smith correct?

A)
Smith is correct on the two-step ahead forecast for change in foreclosure share only.

B)
Smith is correct on the mean-reverting level for forecast of change in foreclosure share only.

C)
Smith is correct on both the forecast and the mean reverting level.

Forecasts are derived by substituting the appropriate value for the period t-1 lagged value.

So, the one-step ahead forecast equals 0.30%. The two-step ahead (%) forecast is derived by substituting 0.30 into the equation.

ΔForeclosure Share_t+1 = 0.05 + 0.25(0.30) = 0.125

Therefore, the two-step ahead forecast equals 0.125%.

(Study Session 3, LOS 13.d)

Assume for this question that Smith finds that the foreclosure share series has a unit root. Under these conditions, she can most reliably regress foreclosure share against the change in interest rates (ΔINT) if:

A)
ΔINT has unit root and is cointegrated with foreclosure share.

B)
ΔINT does not have unit root.

C)
ΔINT has unit root and is not cointegrated with foreclosure share.

The error terms in the regressions for choices A, B, and C will be nonstationary. Therefore, some of the regression assumptions will be violated and the regression results are unreliable. If, however, both series are nonstationary (which will happen if each has unit root), but cointegrated, then the error term will be covariance stationary and the regression results are reliable. (Study Session 3, LOS 13.j)

作者: maxsimax 时间: 2010-4-14 16:32

thanks

作者: luqian55 时间: 2010-5-29 19:49

thanks

欢迎光临 CFA论坛 (http://forum.theanalystspace.com/)

Analysis of Variance Table (ANOVA)

Source	Degrees of Freedom	Sum of Squares	Mean Square
Regression		291.30	145.65
Error	27	132.12
Total	29	423.42