12: Multiple Regression and Issues in Regression Ana - Quantitative Methods - 【CFA二级试题精选】 - CFA论坛

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

11^# 跳转到 »

发表于 2010-4-8 14:11 | 只看该作者

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years. Which of the follow regression equations correctly represents Hilton’s hypothesis?

A)	SALES = α x β₁ POP x β₂ INCOME x β₃ ADV x ε.

B)	SALES = α + β₁ POP + β₂ INCOME + β₃ ADV + ε.

C)	INCOME = α + β₁ POP + β₂ SALES + β₃ ADV + ε.

SALES is the dependent variable. POP, INCOME, and ADV should be the independent variables (on the right hand side) of the equation (in any order). Regression equations are additive.

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

12^#

发表于 2010-4-8 14:11 | 只看该作者

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years and estimates the following equation (standard errors in parentheses):

SALES = α + 0.004 POP + 1.031 INCOME + 2.002 ADV
(0.005)	(0.337)	(2.312)

The critical t-statistic for a 95% confidence level is 2.120. Which of the independent variables is statistically different from zero at the 95% confidence level?

ADV only.

B)	INCOME only.

C)	INCOME and ADV.

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

13^#

发表于 2010-4-8 14:11 | 只看该作者

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years and estimates the following equation (standard errors in parentheses):

SALES = α + 0.004 POP + 1.031 INCOME + 2.002 ADV
(0.005)	(0.337)	(2.312)

The critical t-statistic for a 95% confidence level is 2.120. Which of the independent variables is statistically different from zero at the 95% confidence level?

ADV only.

B)	INCOME only.

C)	INCOME and ADV.

The calculated test statistic is coefficient/standard error. Hence, the t-stats are 0.8 for POP, 3.059 for INCOME, and 0.866 for ADV. Since the t-stat for INCOME is the only one greater than the critical t-value of 2.120, only INCOME is significantly different from zero.

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

14^#

发表于 2010-4-8 14:12 | 只看该作者

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years and estimates the following equation (standard errors in parentheses):

SALES = 0.000 + 0.004 POP + 1.031 INCOME + 2.002 ADV
(0.113)	(0.005)	(0.337)	(2.312)

For next year, Hilton estimates the following parameters: (1) the population under 20 will be 120 million, (2) disposable income will be $300,000,000, and (3) advertising expenditures will be $100,000,000. Based on these estimates and the regression equation, what are predicted sales for the industry for next year?

A)	$557,143,000.

B)	$656,991,000.

C)	$509,980,000.

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

15^#

发表于 2010-4-8 14:12 | 只看该作者

Henry Hilton, CFA, is undertaking an analysis of the bicycle industry. He hypothesizes that bicycle sales (SALES) are a function of three factors: the population under 20 (POP), the level of disposable income (INCOME), and the number of dollars spent on advertising (ADV). All data are measured in millions of units. Hilton gathers data for the last 20 years and estimates the following equation (standard errors in parentheses):

SALES = 0.000 + 0.004 POP + 1.031 INCOME + 2.002 ADV
(0.113)	(0.005)	(0.337)	(2.312)

A)	$557,143,000.

B)	$656,991,000.

C)	$509,980,000.

Predicted sales for next year are:

SALES = α + 0.004 (120) + 1.031 (300) + 2.002 (100) = 509,980,000.

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

16^#

发表于 2010-4-8 14:14 | 只看该作者

In a recent analysis of salaries (in $1,000) of financial analysts, a regression of salaries on education, experience, and gender is run. Gender equals one for men and zero for women. The regression results from a sample of 230 financial analysts are presented below, with t-statistics in parenthesis.

Salaries = 34.98 + 1.2 Education + 0.5 Experience + 6.3 Gender

(29.11) (8.93) (2.98) (1.58)

What is the expected salary (in $1,000) of a woman with 16 years of education and 10 years of experience?

59.18.

54.98.

65.48.

34.98 + 1.2(16) + 0.5(10) = 59.18

Holding everything else constant, do men get paid more than women? Use a 5% level of significance. No, since the t-value:

A)	does not exceed the critical value of 1.96.

B)	does not exceed the critical value of 1.65.

C)	exceeds the critical value of 1.96.

H₀: b_gender ≤ 0
H_a: b_gender> 0

t-value of 1.58 < 1.65 (critical value)

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

17^#

发表于 2010-4-8 14:14 | 只看该作者

Werner Baltz, CFA, has regressed 30 years of data to forecast future sales for National Motor Company based on the percent change in gross domestic product (GDP) and the change in price of a U.S. gallon of fuel at retail. The results are presented below. Note: results must be multiplied by $1,000,000:

Coefficient Estimates

		Standard Error
Predictor	Coefficient	of the Coefficient
Intercept	78	13.710
?1 GDP	30.22	12.120
?2$ Fuel	?412.39	183.981

Analysis of Variance Table (ANOVA)

Source	Degrees of Freedom	Sum of Squares	Mean Square
Regression		291.30	145.65
Error	27	132.12
Total	29	423.42

In 2002, if GDP rises 2.2% and the price of fuels falls $0.15, Baltz’s model will predict Company sales in 2002 to be (in $ millions) closest to:

$206.

$128.

$82.

Sales will be closest to $78 + ($30.22 × 2.2) + [(?412.39) × (?$0.15)] = $206.34 million.

Baltz proceeds to test the hypothesis that none of the independent variables has significant explanatory power. He concludes that, at a 5% level of significance:

A)	at least one of the independent variables has explanatory power, because the calculated F-statistic exceeds its critical value.

B)	none of the independent variables has explanatory power, because the calculated F-statistic does not exceed its critical value.

C)	all of the independent variables have explanatory power, because the calculated F-statistic exceeds its critical value.

From the ANOVA table, the calculated F-statistic is (mean square regression / mean square error) = 145.65 / 4.89 = 29.7853. From the F distribution table (2 df numerator, 27 df denominator) the F-critical value may be interpolated to be 3.36. Because 29.7853 is greater than 3.36, Baltz rejects the null hypothesis and concludes that at least one of the independent variables has explanatory power.

Baltz then tests the individual variables, at a 5% level of significance, to determine whether sales are explained by individual changes in GDP and fuel prices. Baltz concludes that:

A)	neither GDP nor fuel price changes explain changes in sales.

B)	both GDP and fuel price changes explain changes in sales.

C)	only GDP changes explain changes in sales.

From the ANOVA table, the calculated t-statistics are (30.22 / 12.12) = 2.49 for GDP and (?412.39 / 183.981) = ?2.24 for fuel prices. These values are both outside the t-critical value at 27 degrees of freedom of ±2.052. Therefore, Baltz is able to reject the null hypothesis that these coefficients are equal to zero, and concludes that each variable is important in explaining sales.

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

18^#

发表于 2010-4-8 14:17 | 只看该作者

Milky Way, Inc. is a large manufacturer of children’s toys and games based in the United States. Their products have high name brand recognition, and have been sold in retail outlets throughout the United States for nearly fifty years. The founding management team was bought out by a group of investors five years ago. The new management team, led by Russell Stepp, decided that Milky Way should try to expand its sales into the Western European market, which had never been tapped by the former owners. Under Stepp’s leadership, additional personnel are hired in the Research and Development department, and a new marketing plan specific to the European market is implemented. Being a new player in the European market, Stepp knows that it will take several years for Milky Way to establish its brand name in the marketplace, and is willing to make the expenditures now in exchange for increased future profitability.

Now, five years after entering the European market, Stepp is reviewing the results of his plan. Sales in Europe have slowly but steadily increased over since Milky Way’s entrance into the market, but profitability seems to have leveled out. Stepp decides to hire a consultant, Ann Hays, CFA, to review and evaluate their European strategy. One of Hays’ first tasks on the job is to perform a regression analysis on Milky Way’s European sales. She is seeking to determine whether the additional expenditures on research and development and marketing for the European market should be continued in the future.

Hays begins by establishing a relationship between the European sales of Milky Way (in millions of dollars) and the two independent variables, the number of dollars (in millions) spent on research and development (R&D) and marketing (MKTG). Based upon five years of monthly data, Hays constructs the following estimated regression equation:

Estimated Sales = 54.82 + 5.97 (MKTG) + 1.45 (R&D)

Additionally, Hays calculates the following regression estimates:

Coefficient

Standard Error

Intercept

54.82

3.165

MKTG

5.97

1.825

R&D

1.45

0.987

Hays begins the analysis by determining if both of the independent variables are statistically significant. To test whether a coefficient is statistically significant means to test whether it is statistically significantly different from:

A)	the upper tail critical value.

zero.

C)	slope coefficient.

The magnitude of the coefficient reveals nothing about the importance of the independent variable in explaining the dependent variable. Therefore, it must be determined if each independent variable is statistically significant. The null hypothesis is that the slope coefficient for each independent variable equals zero. (Study Session 3, LOS 11.a)

The t-statistic for the marketing variable is calculated to be:

3.271.

17.321.

1.886.

The t-statistic for the marketing coefficient is calculated as follows: (5.97– 0.0) / 1.825 = 3.271. (Study Session 3, LOS 11.g)

Hays formulates a test structure where the decision rule is to reject the null hypothesis if the calculated test statistic is either larger than the upper tail critical value or lower than the lower tail critical value. At a 5% significance level with 57 degrees of freedom, assume that the two-tailed critical t-values are t_c = ±2.004. Based on this information, Hays makes the following conclusions:

Point 1: The intercept term is statistically significant.
Point 2: Both independent variables contribute to explaining states for Milky Way, Inc.
Point 3: If an F-test were being used, the null hypothesis would be rejected.

Which of Hays’ conclusions are CORRECT?

A)	Points 1 and 2.

B)	Points 1 and 3.

C)	Points 2 and 3.

Hays’ Point 1 is correct. The t-statistic for the intercept term is (54.82 – 0) / 3.165 = 17.32, which is greater than the critical value of 2.004, so we can conclude that the intercept term is statistically significant.

Hays’ Point 2 is incorrect. The t-statistic for the R&D term is (1.45 – 0) / 0.987 = 1.469, which is not greater than the critical value of 2.004. This means that only MKTG can be said to contribute to explaining sales for Milky Way, Inc.

Hays’ Point 3 is correct. An F-test tests whether at least one of the independent variables is significantly different from zero, where the null hypothesis is that all none of the independent variables are significant. Since we know that MKTG is a significant variable (t-statistic of 3.271), we can reject the hypothesis that none of the variables are significant. (Study Session 3, LOS 11.i)

Hays is aware that part, but not all, of the total variation in expected sales can be explained by the regression equation. Which of the following statements correctly reflects this relationship?

A)	SST = RSS + SSE.

B)	MSE = RSS + SSE.

C)	SST = RSS + SSE + MSE.

RSS (Regression sum of squares) is the portion of the total variation in Y that is explained by the regression equation. The SSE (Sum of squared errors), is the portion of the total variation in Y that is not explained by the regression. The SST is the total variation of Y around its average value. Therefore, SST = RSS + SSE. These sums of squares will always be calculated for you on the exam, so focus on understanding the interpretation of each. (Study Session 3, LOS 11.i)

Hays decides to test the overall effectiveness of the both independent variables in explaining sales for Milky Way. Assuming that the total sum of squares is 389.14, the sum of squared errors is 146.85 and the mean squared error is 2.576, calculate and interpret the R².

A)	The R² equals 0.623, indicating that the two independent variables account for 62.3% of the variation in monthly sales.

B)	The R² equals 0.242, indicating that the two independent variables account for 24.2% of the variation in monthly sales.

C)	The R² equals 0.623, indicating that the two independent variables account for 37.7% of the variation in monthly sales.

The R² is calculated as (SST – SSE) / SST. In this example, R² equals (389.14–146.85) / 389.14 = .623 or 62.3%. This indicates that the two independent variables together explain 62.3% of the variation in monthly sales. The value for mean squared error is not used in this calculation. (Study Session 3, LOS 11.i)

Stepp is concerned about the validity of Hays’ regression analysis and asks Hays if he can test for the presence of heteroskedasticity. Hays complies with Stepp’s request, and detects the presence of unconditional heteroskedasticity. Which of the following statements regarding heteroskedasticity is most correct?

A)	Unconditional heteroskedasticity usually causes no major problems with the regression.

B)	Heteroskedasticity can be detected either by examining scatter plots of the residual or by using the Durbin-Watson test.

C)	Unconditional heteroskedasticity does create significant problems for statistical inference.

Unconditional heteroskedasticity occurs when the heteroskedasticity is not related to the level of the independent variables. This means that it does not systematically increase or decrease with changes in the independent variable(s). Note that heteroskedasticity occurs when the variance of the residuals is different across all observations in the sample and can be detected either by examining scatter plots or using a Breusch-Pagen test. (Study Session 3, LOS 12.g)

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

19^#

发表于 2010-4-8 14:18 | 只看该作者

John Rains, CFA, is a professor of finance at a large university located in the Eastern United States. He is actively involved with his local chapter of the Society of Financial Analysts. Recently, he was asked to teach one session of a Society-sponsored CFA review course, specifically teaching the class addressing the topic of quantitative analysis. Based upon his familiarity with the CFA exam, he decides that the first part of the session should be a review of the basic elements of quantitative analysis, such as hypothesis testing, regression and multiple regression analysis. He would like to devote the second half of the review session to the practical application of the topics he covered in the first half.

Rains decides to construct a sample regression analysis case study for his students in order to demonstrate a “real-life” application of the concepts. He begins by compiling financial information on a fictitious company called Big Rig, Inc. According to the case study, Big Rig is the primary producer of the equipment used in the exploration for and drilling of new oil and gas wells in the United States. Rains has based the information in the problem on an actual equity holding in his personal portfolio, but has simplified the data for the purposes of the review course.

Rains constructs a basic regression model for Big Rig in order to estimate its profitability (in millions), using two independent variables: the number of new wells drilled in the U.S. (WLS) and the number of new competitors (COMP) entering the market:

Profits = b₀ + b₁WLS – b₂COMP + ε

Based on the model, the estimated regression equation is:

Profits = 22.5 + 0.98(WLS) ? 0.35(COMP)

Using the past 5 years of quarterly data, he calculated the following regression estimates for Big Rig, Inc:

Coefficient

Standard Error

Intercept

22.5

2.465

WLS

0.98

0.683

COMP

0.35

0.186

Using the information presented, the t-statistic for the number of new competitors (COMP) coefficient is:

1.435.

9.128.

1.882.

To test whether a coefficient is statistically significant, the null hypothesis is that the slope coefficient is zero. The t-statistic for the COMP coefficient is calculated as follows:

(0.35 – 0.0) / 0.186 = 1.882

(Study Session 3, LOS 11.g)

Rains asks his students to test the null hypothesis that states for every new well drilled, profits will be increased by the given multiple of the coefficient, all other factors remaining constant. The appropriate hypotheses for this two-tailed test can best be stated as:

A)	H₀: b₁ ≤ 0.98 versus H_a: b₁ > 0.98.

B)	H₀: b₁ = 0.98 versus H_a: b₁ ≠ 0.98.

C)	H₀: b₁ = 0.35 versus H_a: b₁ ≠ 0.35.

The coefficient given in the above table for the number of new wells drilled (WLS) is 0.98. The hypothesis should test to see whether the coefficient is indeed equal to 0.98 or is equal to some other value. Note that hypotheses with the “greater than” or “less than” symbol are used with one-tailed tests. (Study Session 3, LOS 11.g)

Continuing with the analysis of Big Rig, Rains asks his students to calculate the mean squared error(MSE). Assume that the sum of squared errors (SSE) for the regression model is 359.

17.956.

18.896.

21.118.

The MSE is calculated as SSE / (n – k – 1). Recall that there are twenty observations and two independent variables. Therefore, the SEE in this instance = 359 / (20 – 2 ? 1) = 21.118. (Study Session 3, LOS 11.i)

Rains now wants to test the students’ knowledge of the use of the F-test and the interpretation of the F-statistic. Which of the following statements regarding the F-test and the F-statistic is the most correct?

A)	The F-test is usually formulated as a two-tailed test.

B)	The F-statistic is used to test whether at least one independent variable in a set of independent variables explains a significant portion of the variation of the dependent variable.

C)	The F-statistic is almost always formulated to test each independent variable separately, in order to identify which variable is the most statistically significant.

An F-test assesses how well a set of impendent variables, as a group, explains the variation in the dependent variable. It tests all independent variables as a group, and is always a one-tailed test. The decision rule is to reject the null hypothesis if the calculated F-value is greater than the critical F-value. (Study Session 3, LOS 11.i)

One of the main assumptions of a multiple regression model is that the variance of the residuals is constant across all observations in the sample. A violation of the assumption is known as:

A)	robust standard errors.

B)	heteroskedasticity.

C)	positive serial correlation.

Heteroskedasticity is present when the variance of the residuals is not the same across all observations in the sample, and there are sub-samples that are more spread out than the rest of the sample. (Study Session 3, LOS 12.g)

Rains reminds his students that a common condition that can distort the results of a regression analysis is referred to as serial correlation. The presence of serial correlation can be detected through the use of:

A)	the Breusch-Pagen test.

B)	the Hansen method.

C)	the Durbin-Watson statistic.

The Durbin-Watson test (DW ≈ 2(1 ? r)) can detect serial correlation. Another commonly used method is to visually inspect a scatter plot of residuals over time. The Hansen method does not detect serial correlation, but can be used to remedy the situation. Note that the Breusch-Pagen test is used to detect heteroskedasticity. (Study Session 3, LOS 12.g)

Rank: 4

UID: 137525
帖子: 5724
主题: 885
注册时间: 2009-7-1
最后登录: 2011-3-29

20^#

发表于 2010-4-8 14:21 | 只看该作者

Based on her regression results in Exhibit 2, using a 5% level of significance, Smith should conclude that:

A)	stimulus packages have significant effects on foreclosure percentages, but housing crises do not have significant effects on foreclosure percentages.

B)	stimulus packages do not have significant effects on foreclosure percentages, but housing crises do have significant effects on foreclosure percentages.

C)	both stimulus packages and housing crises have significant effects on foreclosure percentages.

The appropriate test statistic for tests of significance on individual slope coefficient estimates is the t-statistic, which is provided in Exhibit 2 for each regression coefficient estimate. The reported t-statistic equals -2.10 for the STIM slope estimate and equals 2.35 for the CRISIS slope estimate. The critical t-statistic for the 5% significance level equals 2.12 (16 degrees of freedom, 5% level of significance).

Therefore, the slope estimate for STIM is not statistically significant (the reported t-statistic, -2.10, is not large enough). In contrast, the slope estimate for CRISIS is statistically significant (the reported t-statistic, 2.35, exceeds the 5% significance level critical value). (Study Session 3, LOS 12.a)

The standard error of estimate for Smith’s regression is closest to:

0.53

0.56

0.16

The formula for the Standard Error of the Estimate (SEE) is:

The SEE equals the standard deviation of the regression residuals. A low SEE implies a high R². (Study Session 3, LOS 12.e)

Is Smith correct or incorrect regarding Concerns 1 and 2?

A)	Only correct on one concern and incorrect on the other.

B)	Incorrect on both Concerns.

C)	Correct on both Concerns.

Smith’s Concern 1 is incorrect. Heteroskedasticity is a violation of a regression assumption, and refers to regression error variance that is not constant over all observations in the regression. Conditional heteroskedasticity is a case in which the error variance is related to the magnitudes of the independent variables (the error variance is “conditional” on the independent variables). The consequence of conditional heteroskedasticity is that the standard errors will be too low, which, in turn, causes the t-statistics to be too high. Smith’s Concern 2 also is not correct. Multicollinearity refers to independent variables that are correlated with each other. Multicollinearity causes standard errors for the regression coefficients to be too high, which, in turn, causes the t-statistics to be too low. However, contrary to Smith’s concern, multicollinearity has no effect on the F-statistic. (Study Session 3, LOS 12.g)

The most recent change in foreclosure share was +1 percent. Smith decides to base her analysis on the data and methods provided in Exhibits 4 and 5, and determines that the two-step ahead forecast for the change in foreclosure share (in percent) is 0.125, and that the mean reverting value for the change in foreclosure share (in percent) is 0.071. Is Smith correct?

A)	Smith is correct on the two-step ahead forecast for change in foreclosure share only.

B)	Smith is correct on the mean-reverting level for forecast of change in foreclosure share only.

C)	Smith is correct on both the forecast and the mean reverting level.

Forecasts are derived by substituting the appropriate value for the period t-1 lagged value.

So, the one-step ahead forecast equals 0.30%. The two-step ahead (%) forecast is derived by substituting 0.30 into the equation.

ΔForeclosure Share_t+1 = 0.05 + 0.25(0.30) = 0.125

Therefore, the two-step ahead forecast equals 0.125%.

(Study Session 3, LOS 13.d)

Assume for this question that Smith finds that the foreclosure share series has a unit root. Under these conditions, she can most reliably regress foreclosure share against the change in interest rates (ΔINT) if:

A)	ΔINT has unit root and is cointegrated with foreclosure share.

B)	ΔINT does not have unit root.

C)	ΔINT has unit root and is not cointegrated with foreclosure share.

The error terms in the regressions for choices A, B, and C will be nonstationary. Therefore, some of the regression assumptions will be violated and the regression results are unreliable. If, however, both series are nonstationary (which will happen if each has unit root), but cointegrated, then the error term will be covariance stationary and the regression results are reliable. (Study Session 3, LOS 13.j)