上一主题:Quantitative Analysis【Reading 13】Sample
下一主题:Quantitative Analysis 【Reading 11】Sample
返回列表 发帖
A dependent variable is regressed against three independent variables across 25 observations. The regression sum of squares is 119.25, and the total sum of squares is 294.45. The following are the estimated coefficient values and standard errors of the coefficients.

Coefficient

Value

Standard error

1

2.43

1.4200

2

3.21

1.5500

3

0.18

0.0818


What is the p-value for the test of the hypothesis that all three of the coefficients are equal to zero?
A)
Between 0.025 and 0.05.
B)
Between 0.05 and 0.10.
C)
lower than 0.025.



This test requires an F-statistic, which is equal to the ratio of the mean regression sum of squares to the mean squared error.
The mean regression sum of squares is the regression sum of squares divided by the number of independent variables, which is 119.25 / 3 = 39.75.
The residual sum of squares is the difference between the total sum of squares and the regression sum of squares, which is 294.45 − 119.25 = 175.20. The denominator degrees of freedom is the number of observations minus the number of independent variables, minus 1, which is 25 − 3 − 1 = 21. The mean squared error is the residual sum of squares divided by the denominator degrees of freedom, which is 175.20 / 21 = 8.34.
The F-statistic is 39.75 / 8.34 = 4.76, which is higher than the F-value (with 3 numerator degrees of freedom and 21 denominator degrees of freedom) of 3.07 at the 5% level of significance and higher than the F-value of 3.82 at the 2.5% level of significance. The conclusion is that the p-value must be lower than 0.025.
Remember the p-value is the probability that lies above the computed test statistic for upper tail tests or below the computed test statistic for lower tail tests.

TOP

Consider the following analysis of variance (ANOVA) table:
SourceSum of squaresDegrees of freedomMean square
Regression  20  120
Error  8040  2
Total10041

The F-statistic for the test of the fit of the model is closest to:
A)
0.10.
B)
10.00.
C)
0.25.



The F-statistic is equal to the ratio of the mean squared regression to the mean squared error.
F = MSR/MSE = 20 / 2 = 10.

TOP

Which of the following statements about the F-statistic is least accurate?
A)
Rejecting the null hypothesis means that only one of the independent variables is statistically significant.
B)
F = MSR/MSE.
C)
dfnumerator = k and dfdenominator = n − k − 1.



An F-test assesses how well the set of independent variables, as a group, explains the variation in the dependent variable. That is, the F-statistic is used to test whether at least one of the independent variables explains a significant portion of the variation of the dependent variable.

TOP

Toni Williams, CFA, has determined that commercial electric generator sales in the Midwest U.S. for Self-Start Company is a function of several factors in each area: the cost of heating oil, the temperature, snowfall, and housing starts. Using data for the most currently available year, she runs a cross-sectional regression where she regresses the deviation of sales from the historical average in each area on the deviation of each explanatory variable from the historical average of that variable for that location. She feels this is the most appropriate method since each geographic area will have different average values for the inputs, and the model can explain how current conditions explain how generator sales are higher or lower from the historical average in each area. In summary, she regresses current sales for each area minus its respective historical average on the following variables for each area.
  • The difference between the retail price of heating oil and its historical average.
  • The mean number of degrees the temperature is below normal in Chicago.
  • The amount of snowfall above the average.
  • The percentage of housing starts above the average.

Williams used a sample of 26 observations obtained from 26 metropolitan areas in the Midwest U.S. The results are in the tables below. The dependent variable is in sales of generators in millions of dollars.

Coefficient Estimates Table

Variable

Estimated Coefficient

Standard Error of the Coefficient


Intercept

5.00

1.850


$ Heating Oil

2.00

0.827


Low Temperature

3.00

1.200


Snowfall

10.00

4.833


Housing Starts

5.00

2.333


Analysis of Variance Table (ANOVA)

Source

Degrees of Freedom

Sum of Squares

Mean Square


Regression

4

335.20

83.80


Error

21

606.40

28.88


Total

25

941.60



One of her goals is to forecast the sales of the Chicago metropolitan area next year. For that area and for the upcoming year, Williams obtains the following projections: heating oil prices will be $0.10 above average, the temperature in Chicago will be 5 degrees below normal, snowfall will be 3 inches above average, and housing starts will be 3% below average.
In addition to making forecasts and testing the significance of the estimated coefficients, she plans to perform diagnostic tests to verify the validity of the model’s results.According to the model and the data for the Chicago metropolitan area, the forecast of generator sales is:
A)
$55 million above average.
B)
$35.2 million above the average.
C)
$65 million above the average.



The model uses a multiple regression equation to predict sales by multiplying the estimated coefficient by the observed value to get:
[5 + (2 × 0.10) + (3 × 5) + (10 × 3) + (5 × (−3))] × $1,000,000 = $35.2 million.

(Study Session 3, LOS 12.c)


Williams proceeds to test the hypothesis that none of the independent variables has significant explanatory power. He concludes that, at a 5% level of significance:
A)
none of the independent variables has explanatory power, because the calculated F-statistic does not exceed its critical value.
B)
all of the independent variables have explanatory power, because the calculated F-statistic exceeds its critical value.
C)
at least one of the independent variables has explanatory power, because the calculated F-statistic exceeds its critical value.



From the ANOVA table, the calculated F-statistic is (mean square regression / mean square error) = (83.80 / 28.88) = 2.9017. From the F distribution table (4 df numerator, 21 df denominator) the critical F value is 2.84. Because 2.9017 is greater than 2.84, Williams rejects the null hypothesis and concludes that at least one of the independent variables has explanatory power. (Study Session 3, LOS 12.e)

With respect to testing the validity of the model’s results, Williams may wish to perform:
A)
both a Durbin-Watson test and a Breusch-Pagan test.
B)
a Durbin-Watson test, but not a Breusch-Pagan test.
C)
a Breusch-Pagan test, but not a Durbin-Watson test.



Since this is not an autoregression, a test for serial correlation is appropriate so the Durbin-Watson test would be used. The Breusch-Pagan test for heteroskedasticity would be a good idea. (Study Session 3, LOS 12.i)

Williams decides to use two-tailed tests on the individual variables, at a 5% level of significance, to determine whether electric generator sales are explained by each of them individually. Williams concludes that:
A)
all of the variables explain sales.
B)
all of the variables except snowfall explain sales.
C)
all of the variables except snowfall and housing starts explain sales.


The calculated t–statistics are:
Heating Oil: (2.00 / 0.827) = 2.4184 Low Temperature: (3.00 / 1.200) = 2.5000 Snowfall: (10.00 / 4.833) = 2.0691 Housing Starts: (5.00 / 2.333) = 2.1432
All of these values are outside the t–critical value (at (26 − 4 − 1) = 21 degrees of freedom) of 2.080, except the change in snowfall. So Williams should reject the null hypothesis for the other variables and conclude that they explain sales, but fail to reject the null hypothesis with respect to snowfall and conclude that increases or decreases in snowfall do not explain sales. (Study Session 3, LOS 12.b)


When Williams ran the model, the computer said the R2 is 0.233. She examines the other output and concludes that this is the:
A)
neither the unadjusted nor adjusted R2 value, nor the coefficient of correlation.
B)
adjusted R2 value.
C)
unadjusted R2 value.



This can be answered by recognizing that the unadjusted R-square is (335.2 / 941.6) = 0.356. Thus, the reported value must be the adjusted R2. To verify this we see that the adjusted R-squared is: 1− ((26 − 1) / (26 − 4 − 1)) × (1 − 0.356) = 0.233. Note that whenever there is more than one independent variable, the adjusted R2 will always be less than R2. (Study Session 3, LOS 12.f)

In preparing and using this model, Williams has least likely relied on which of the following assumptions?
A)
There is a linear relationship between the independent variables.
B)
A linear relationship exists between the dependent and independent variables.
C)
The disturbance or error term is normally distributed.



Multiple regression models assume that there is no linear relationship between two or more of the independent variables. The other answer choices are both assumptions of multiple regression. (Study Session 3, LOS 12.d)

TOP

Manuel Mercado, CFA has performed the following two regressions on sales data for a given industry. He wants to forecast sales for each quarter of the upcoming year.
Model ONE

Regression Statistics

Multiple R0.941828
R20.887039
Adjusted R20.863258
Standard Error2.543272
Observations24

Durbin-Watson test statistic = 0.7856
ANOVA
dfSSMSFSignificance F
Regression4965.0619241.265537.300069.49E−09
Residual19122.89646.4682
Total231087.9583

CoefficientsStandard Errort-Statistic
Intercept31.408331.486621.12763
Q1−3.777981.485952−2.54246
Q2−2.463101.476204−1.66853
Q3−0.148211.470324−0.10080
TREND0.8517860.07533511.20848

Model TWO

Regression Statistics

Multiple R0.941796
R20.886979
Adjusted R20.870026
Standard Error2.479538
Observations24

Durbin-Watson test statistic = 0.7860
dfSSMSFSignificance F
Regression3964.9962321.665452.31941.19E−09
Residual20122.96226.14811
Total231087.9584


CoefficientsStandard Errort-Statistic
Intercept31.328881.22886525.49416
Q1−3.702881.253493−2.95405
Q2−2.388391.244727−1.91881
TREND0.852180.07399111.51732

The dependent variable is the level of sales for each quarter, in $ millions, which began with the first quarter of the first year. Q1, Q2, and Q3 are seasonal dummy variables representing each quarter of the year. For the first four observations the dummy variables are as follows: Q11,0,0,0), Q20,1,0,0), Q30,0,1,0). The TREND is a series that begins with one and increases by one each period to end with 24. For all tests, Mercado will use a 5% level of significance. Tests of coefficients will be two-tailed, and all others are one-tailed.Which model would be a better choice for making a forecast?
A)
Model TWO because serial correlation is not a problem.
B)
Model ONE because it has a higher R2.
C)
Model TWO because it has a higher adjusted R2.



Model TWO has a higher adjusted R2 and thus would produce the more reliable estimates. As is always the case when a variable is removed, R2 for Model TWO is lower. The increase in adjusted R2 indicates that the removed variable, Q3, has very little explanatory power, and removing it should improve the accuracy of the estimates. With respect to the references to autocorrelation, we can compare the Durbin-Watson statistics to the critical values on a Durbin-Watson table. Since the critical DW statistics for Model ONE and TWO respectively are 1.01 (>0.7856) and 1.10 (>0.7860), serial correlation is a problem for both equations. (Study Session 3, LOS 12.f)

Using Model ONE, what is the sales forecast for the second quarter of the next year?
A)
$51.09 million.
B)
$56.02 million.
C)
$46.31 million.


The estimate for the second quarter of the following year would be (in millions):
31.4083 + (−2.4631) + (24 + 2) × 0.851786 = 51.091666. (Study Session 3, LOS 12.c)



Which of the coefficients that appear in both models are not significant at the 5% level in a two-tailed test?
A)
The coefficients on Q1 and Q2 only.
B)
The coefficient on Q2 only.
C)
The intercept only.



The absolute value of the critical T-statistics for Model ONE and TWO are 2.093 and 2.086, respectively. Since the t-statistics for Q2 in Models ONE and TWO are −1.6685 and −1.9188, respectively, these fall below the critical values for both models. (Study Session 3, LOS 12.a)

If it is determined that conditional heteroskedasticity is present in model one, which of the following inferences are most accurate?
A)
Regression coefficients will be biased but standard errors will be unbiased.
B)
Both the regression coefficients and the standard errors will be biased.
C)
Regression coefficients will be unbiased but standard errors will be biased.



Presence of conditional heteroskedasticity will not affect the consistency of regression coefficients but will bias the standard errors leading to incorrect application of t-tests for statistical significance of regression parameters. (Study Session 3, LOS 12.i)

Mercado probably did not include a fourth dummy variable Q4, which would have had 0, 0, 0, 1 as its first four observations because:
A)
it would have lowered the explanatory power of the equation.
B)
the intercept is essentially the dummy for the fourth quarter.
C)
it would not have been significant.


The fourth quarter serves as the base quarter, and for the fourth quarter, Q1 = Q2 = Q3 = 0. Had the model included a Q4 as specified, we could not have had an intercept. In that case, for Model ONE for example, the estimate of Q4 would have been 31.40833. The dummies for the other quarters would be the 31.40833 plus the estimated dummies from the Model ONE. In a model that included Q1, Q2, Q3, and Q4 but no intercept, for example:
Q1 = 31.40833 + (−3.77798) = 27.63035
Such a model would produce the same estimated values for the dependent variable. (Study Session 3, LOS 12.h)


If Mercado determines that Model TWO is the appropriate specification, then he is essentially saying that for each year, value of sales from quarter three to four is expected to:
A)
remain approximately the same.
B)
grow, but by less than $1,000,000.
C)
grow by more than $1,000,000.



The specification of Model TWO essentially assumes there is no difference attributed to the change of the season from the third to fourth quarter. However, the time trend is significant. The trend effect for moving from one season to the next is the coefficient on TREND times $1,000,000 which is $852,182 for Equation TWO. (Study Session 3, LOS 13.a)

TOP

Which of the following statements regarding the R2 is least accurate?
A)
The adjusted-R2 is greater than the R2 in multiple regression.
B)
The adjusted-R2 not appropriate to use in simple regression.
C)
It is possible for the adjusted-R2 to decline as more variables are added to the multiple regression.



The adjusted-R2 will always be less than R2in multiple regression.

TOP

Which of the following statements regarding the R2 is least accurate?
A)
The R2 of a regression will be greater than or equal to the adjusted-R2 for the same regression.
B)
The F-statistic for the test of the fit of the model is the ratio of the mean squared regression to the mean squared error.
C)
The R2 is the ratio of the unexplained variation to the explained variation of the dependent variable.



The R2 is the ratio of the explained variation to the total variation.

TOP

An analyst is estimating a regression equation with three independent variables, and calculates the R2, the adjusted R2, and the F-statistic. The analyst then decides to add a fourth variable to the equation. Which of the following is most accurate?
A)
The R2 will be higher, but the adjusted R2 and F-statistic could be higher or lower.
B)
The R2 and F-statistic will be higher, but the adjusted R2 could be higher or lower.
C)
The adjusted R2 will be higher, but the R2 and F-statistic could be higher or lower.



The R2 will always increase as the number of variables increase. The adjusted R2 specifically adjusts for the number of variables, and might not increase as the number of variables rise. As the number of variables increases, the regression sum of squares will rise and the residual T sum of squares will fall—this will tend to make the F-statistic larger. However, the number degrees of freedom will also rise, and the denominator degrees of freedom will fall, which will tend to make the F-statistic smaller. Consequently, like the adjusted R2, the F-statistic could be higher or lower.

TOP

An analyst regresses the return of a S&P 500 index fund against the S&P 500, and also regresses the return of an active manager against the S&P 500. The analyst uses the last five years of data in both regressions. Without making any other assumptions, which of the following is most accurate? The index fund:
A)
regression should have higher sum of squares regression as a ratio to the total sum of squares.
B)
should have a higher coefficient on the independent variable.
C)
should have a lower coefficient of determination.



The index fund regression should provide a higher R2 than the active manager regression. R2 is the sum of squares regression divided by the total sum of squares.

TOP

May Jones estimated a regression that produced the following analysis of variance (ANOVA) table:

Source

Sum of squares

Degrees of freedom

Mean square

Regression

  20

  1

20

Error

  80

40

  2

Total

100

41


The values of R2 and the F-statistic for the fit of the model are:

A)
R2 = 0.25 and F = 0.909.
B)
R2 = 0.20 and F = 10.
C)
R2 = 0.25 and F = 10.



R2 = RSS / SST = 20 / 100 = 0.20
The F-statistic is equal to the ratio of the mean squared regression to the mean squared error.F = 20 / 2 = 10

TOP

返回列表
上一主题:Quantitative Analysis【Reading 13】Sample
下一主题:Quantitative Analysis 【Reading 11】Sample