返回列表 发帖
One of the underlying assumptions of a multiple regression is that the variance of the residuals is constant for various levels of the independent variables. This quality is referred to as:
A)
a linear relationship.
B)
homoskedasticity.
C)
a normal distribution.



Homoskedasticity refers to the basic assumption of a multiple regression model that the variance of the error terms is constant.

TOP

Which of the following statements least accurately describes one of the fundamental multiple regression assumptions?
A)
The variance of the error terms is not constant (i.e., the errors are heteroskedastic).
B)
The error term is normally distributed.
C)
The independent variables are not random.



The variance of the error term IS assumed to be constant, resulting in errors that are homoskedastic.

TOP

Assume that in a particular multiple regression model, it is determined that the error terms are uncorrelated with each other. Which of the following statements is most accurate?
A)
Unconditional heteroskedasticity present in this model should not pose a problem, but can be corrected by using robust standard errors.
B)
This model is in accordance with the basic assumptions of multiple regression analysis because the errors are not serially correlated.
C)
Serial correlation may be present in this multiple regression model, and can be confirmed only through a Durbin-Watson test.



One of the basic assumptions of multiple regression analysis is that the error terms are not correlated with each other. In other words, the error terms are not serially correlated. Multicollinearity and heteroskedasticity are problems in multiple regression that are not related to the correlation of the error terms.

TOP

An analyst runs a regression of monthly value-stock returns on five independent variables over 48 months. The total sum of squares is 430, and the sum of squared errors is 170. Test the null hypothesis at the 2.5% and 5% significance level that all five of the independent variables are equal to zero.
A)
Rejected at 2.5% significance and 5% significance.
B)
Rejected at 5% significance only.
C)
Not rejected at 2.5% or 5.0% significance.



The F-statistic is equal to the ratio of the mean squared regression (MSR) to the mean squared error (MSE).
RSS = SST – SSE = 430 – 170 = 260
MSR = 260 / 5 = 52
MSE = 170 / (48 – 5 – 1) = 4.05
F = 52 / 4.05 = 12.84
The critical F-value for 5 and 42 degrees of freedom at a 5% significance level is approximately 2.44. The critical F-value for 5 and 42 degrees of freedom at a 2.5% significance level is approximately 2.89. Therefore, we can reject the null hypothesis at either level of significance and conclude that at least one of the five independent variables explains a significant portion of the variation of the dependent variable.

TOP

Consider the following analysis of variance table:
SourceSum of SquaresDfMean Square
Regression20120
Error80204
Total10021

The F-statistic for a test of the overall significance of the model is closest to:
A)
0.20
B)
5.00
C)
0.05



The F-statistic is equal to the ratio of the mean squared regression to the mean squared error.
F = MSR / MSE = 20 / 4 = 5.

TOP

A dependent variable is regressed against three independent variables across 25 observations. The regression sum of squares is 119.25, and the total sum of squares is 294.45. The following are the estimated coefficient values and standard errors of the coefficients.

Coefficient

Value

Standard error

1

2.43

1.4200

2

3.21

1.5500

3

0.18

0.0818


What is the p-value for the test of the hypothesis that all three of the coefficients are equal to zero?
A)
Between 0.025 and 0.05.
B)
Between 0.05 and 0.10.
C)
lower than 0.025.



This test requires an F-statistic, which is equal to the ratio of the mean regression sum of squares to the mean squared error.
The mean regression sum of squares is the regression sum of squares divided by the number of independent variables, which is 119.25 / 3 = 39.75.
The residual sum of squares is the difference between the total sum of squares and the regression sum of squares, which is 294.45 − 119.25 = 175.20. The denominator degrees of freedom is the number of observations minus the number of independent variables, minus 1, which is 25 − 3 − 1 = 21. The mean squared error is the residual sum of squares divided by the denominator degrees of freedom, which is 175.20 / 21 = 8.34.
The F-statistic is 39.75 / 8.34 = 4.76, which is higher than the F-value (with 3 numerator degrees of freedom and 21 denominator degrees of freedom) of 3.07 at the 5% level of significance and higher than the F-value of 3.82 at the 2.5% level of significance. The conclusion is that the p-value must be lower than 0.025.
Remember the p-value is the probability that lies above the computed test statistic for upper tail tests or below the computed test statistic for lower tail tests.

TOP

Consider the following analysis of variance (ANOVA) table:
SourceSum of squaresDegrees of freedomMean square
Regression  20  120
Error  8040  2
Total10041

The F-statistic for the test of the fit of the model is closest to:
A)
0.10.
B)
10.00.
C)
0.25.



The F-statistic is equal to the ratio of the mean squared regression to the mean squared error.
F = MSR/MSE = 20 / 2 = 10.

TOP

Which of the following statements about the F-statistic is least accurate?
A)
Rejecting the null hypothesis means that only one of the independent variables is statistically significant.
B)
F = MSR/MSE.
C)
dfnumerator = k and dfdenominator = n − k − 1.



An F-test assesses how well the set of independent variables, as a group, explains the variation in the dependent variable. That is, the F-statistic is used to test whether at least one of the independent variables explains a significant portion of the variation of the dependent variable.

TOP

Toni Williams, CFA, has determined that commercial electric generator sales in the Midwest U.S. for Self-Start Company is a function of several factors in each area: the cost of heating oil, the temperature, snowfall, and housing starts. Using data for the most currently available year, she runs a cross-sectional regression where she regresses the deviation of sales from the historical average in each area on the deviation of each explanatory variable from the historical average of that variable for that location. She feels this is the most appropriate method since each geographic area will have different average values for the inputs, and the model can explain how current conditions explain how generator sales are higher or lower from the historical average in each area. In summary, she regresses current sales for each area minus its respective historical average on the following variables for each area.
  • The difference between the retail price of heating oil and its historical average.
  • The mean number of degrees the temperature is below normal in Chicago.
  • The amount of snowfall above the average.
  • The percentage of housing starts above the average.

Williams used a sample of 26 observations obtained from 26 metropolitan areas in the Midwest U.S. The results are in the tables below. The dependent variable is in sales of generators in millions of dollars.

Coefficient Estimates Table

Variable

Estimated Coefficient

Standard Error of the Coefficient


Intercept

5.00

1.850


$ Heating Oil

2.00

0.827


Low Temperature

3.00

1.200


Snowfall

10.00

4.833


Housing Starts

5.00

2.333


Analysis of Variance Table (ANOVA)

Source

Degrees of Freedom

Sum of Squares

Mean Square


Regression

4

335.20

83.80


Error

21

606.40

28.88


Total

25

941.60



One of her goals is to forecast the sales of the Chicago metropolitan area next year. For that area and for the upcoming year, Williams obtains the following projections: heating oil prices will be $0.10 above average, the temperature in Chicago will be 5 degrees below normal, snowfall will be 3 inches above average, and housing starts will be 3% below average.
In addition to making forecasts and testing the significance of the estimated coefficients, she plans to perform diagnostic tests to verify the validity of the model’s results.According to the model and the data for the Chicago metropolitan area, the forecast of generator sales is:
A)
$55 million above average.
B)
$35.2 million above the average.
C)
$65 million above the average.



The model uses a multiple regression equation to predict sales by multiplying the estimated coefficient by the observed value to get:
[5 + (2 × 0.10) + (3 × 5) + (10 × 3) + (5 × (−3))] × $1,000,000 = $35.2 million.

(Study Session 3, LOS 12.c)


Williams proceeds to test the hypothesis that none of the independent variables has significant explanatory power. He concludes that, at a 5% level of significance:
A)
none of the independent variables has explanatory power, because the calculated F-statistic does not exceed its critical value.
B)
all of the independent variables have explanatory power, because the calculated F-statistic exceeds its critical value.
C)
at least one of the independent variables has explanatory power, because the calculated F-statistic exceeds its critical value.



From the ANOVA table, the calculated F-statistic is (mean square regression / mean square error) = (83.80 / 28.88) = 2.9017. From the F distribution table (4 df numerator, 21 df denominator) the critical F value is 2.84. Because 2.9017 is greater than 2.84, Williams rejects the null hypothesis and concludes that at least one of the independent variables has explanatory power. (Study Session 3, LOS 12.e)

With respect to testing the validity of the model’s results, Williams may wish to perform:
A)
both a Durbin-Watson test and a Breusch-Pagan test.
B)
a Durbin-Watson test, but not a Breusch-Pagan test.
C)
a Breusch-Pagan test, but not a Durbin-Watson test.



Since this is not an autoregression, a test for serial correlation is appropriate so the Durbin-Watson test would be used. The Breusch-Pagan test for heteroskedasticity would be a good idea. (Study Session 3, LOS 12.i)

Williams decides to use two-tailed tests on the individual variables, at a 5% level of significance, to determine whether electric generator sales are explained by each of them individually. Williams concludes that:
A)
all of the variables explain sales.
B)
all of the variables except snowfall explain sales.
C)
all of the variables except snowfall and housing starts explain sales.


The calculated t–statistics are:
Heating Oil: (2.00 / 0.827) = 2.4184 Low Temperature: (3.00 / 1.200) = 2.5000 Snowfall: (10.00 / 4.833) = 2.0691 Housing Starts: (5.00 / 2.333) = 2.1432
All of these values are outside the t–critical value (at (26 − 4 − 1) = 21 degrees of freedom) of 2.080, except the change in snowfall. So Williams should reject the null hypothesis for the other variables and conclude that they explain sales, but fail to reject the null hypothesis with respect to snowfall and conclude that increases or decreases in snowfall do not explain sales. (Study Session 3, LOS 12.b)


When Williams ran the model, the computer said the R2 is 0.233. She examines the other output and concludes that this is the:
A)
neither the unadjusted nor adjusted R2 value, nor the coefficient of correlation.
B)
adjusted R2 value.
C)
unadjusted R2 value.



This can be answered by recognizing that the unadjusted R-square is (335.2 / 941.6) = 0.356. Thus, the reported value must be the adjusted R2. To verify this we see that the adjusted R-squared is: 1− ((26 − 1) / (26 − 4 − 1)) × (1 − 0.356) = 0.233. Note that whenever there is more than one independent variable, the adjusted R2 will always be less than R2. (Study Session 3, LOS 12.f)

In preparing and using this model, Williams has least likely relied on which of the following assumptions?
A)
There is a linear relationship between the independent variables.
B)
A linear relationship exists between the dependent and independent variables.
C)
The disturbance or error term is normally distributed.



Multiple regression models assume that there is no linear relationship between two or more of the independent variables. The other answer choices are both assumptions of multiple regression. (Study Session 3, LOS 12.d)

TOP

Manuel Mercado, CFA has performed the following two regressions on sales data for a given industry. He wants to forecast sales for each quarter of the upcoming year.
Model ONE

Regression Statistics

Multiple R0.941828
R20.887039
Adjusted R20.863258
Standard Error2.543272
Observations24

Durbin-Watson test statistic = 0.7856
ANOVA
dfSSMSFSignificance F
Regression4965.0619241.265537.300069.49E−09
Residual19122.89646.4682
Total231087.9583

CoefficientsStandard Errort-Statistic
Intercept31.408331.486621.12763
Q1−3.777981.485952−2.54246
Q2−2.463101.476204−1.66853
Q3−0.148211.470324−0.10080
TREND0.8517860.07533511.20848

Model TWO

Regression Statistics

Multiple R0.941796
R20.886979
Adjusted R20.870026
Standard Error2.479538
Observations24

Durbin-Watson test statistic = 0.7860
dfSSMSFSignificance F
Regression3964.9962321.665452.31941.19E−09
Residual20122.96226.14811
Total231087.9584


CoefficientsStandard Errort-Statistic
Intercept31.328881.22886525.49416
Q1−3.702881.253493−2.95405
Q2−2.388391.244727−1.91881
TREND0.852180.07399111.51732

The dependent variable is the level of sales for each quarter, in $ millions, which began with the first quarter of the first year. Q1, Q2, and Q3 are seasonal dummy variables representing each quarter of the year. For the first four observations the dummy variables are as follows: Q11,0,0,0), Q20,1,0,0), Q30,0,1,0). The TREND is a series that begins with one and increases by one each period to end with 24. For all tests, Mercado will use a 5% level of significance. Tests of coefficients will be two-tailed, and all others are one-tailed.Which model would be a better choice for making a forecast?
A)
Model TWO because serial correlation is not a problem.
B)
Model ONE because it has a higher R2.
C)
Model TWO because it has a higher adjusted R2.



Model TWO has a higher adjusted R2 and thus would produce the more reliable estimates. As is always the case when a variable is removed, R2 for Model TWO is lower. The increase in adjusted R2 indicates that the removed variable, Q3, has very little explanatory power, and removing it should improve the accuracy of the estimates. With respect to the references to autocorrelation, we can compare the Durbin-Watson statistics to the critical values on a Durbin-Watson table. Since the critical DW statistics for Model ONE and TWO respectively are 1.01 (>0.7856) and 1.10 (>0.7860), serial correlation is a problem for both equations. (Study Session 3, LOS 12.f)

Using Model ONE, what is the sales forecast for the second quarter of the next year?
A)
$51.09 million.
B)
$56.02 million.
C)
$46.31 million.


The estimate for the second quarter of the following year would be (in millions):
31.4083 + (−2.4631) + (24 + 2) × 0.851786 = 51.091666. (Study Session 3, LOS 12.c)



Which of the coefficients that appear in both models are not significant at the 5% level in a two-tailed test?
A)
The coefficients on Q1 and Q2 only.
B)
The coefficient on Q2 only.
C)
The intercept only.



The absolute value of the critical T-statistics for Model ONE and TWO are 2.093 and 2.086, respectively. Since the t-statistics for Q2 in Models ONE and TWO are −1.6685 and −1.9188, respectively, these fall below the critical values for both models. (Study Session 3, LOS 12.a)

If it is determined that conditional heteroskedasticity is present in model one, which of the following inferences are most accurate?
A)
Regression coefficients will be biased but standard errors will be unbiased.
B)
Both the regression coefficients and the standard errors will be biased.
C)
Regression coefficients will be unbiased but standard errors will be biased.



Presence of conditional heteroskedasticity will not affect the consistency of regression coefficients but will bias the standard errors leading to incorrect application of t-tests for statistical significance of regression parameters. (Study Session 3, LOS 12.i)

Mercado probably did not include a fourth dummy variable Q4, which would have had 0, 0, 0, 1 as its first four observations because:
A)
it would have lowered the explanatory power of the equation.
B)
the intercept is essentially the dummy for the fourth quarter.
C)
it would not have been significant.


The fourth quarter serves as the base quarter, and for the fourth quarter, Q1 = Q2 = Q3 = 0. Had the model included a Q4 as specified, we could not have had an intercept. In that case, for Model ONE for example, the estimate of Q4 would have been 31.40833. The dummies for the other quarters would be the 31.40833 plus the estimated dummies from the Model ONE. In a model that included Q1, Q2, Q3, and Q4 but no intercept, for example:
Q1 = 31.40833 + (−3.77798) = 27.63035
Such a model would produce the same estimated values for the dependent variable. (Study Session 3, LOS 12.h)


If Mercado determines that Model TWO is the appropriate specification, then he is essentially saying that for each year, value of sales from quarter three to four is expected to:
A)
remain approximately the same.
B)
grow, but by less than $1,000,000.
C)
grow by more than $1,000,000.



The specification of Model TWO essentially assumes there is no difference attributed to the change of the season from the third to fourth quarter. However, the time trend is significant. The trend effect for moving from one season to the next is the coefficient on TREND times $1,000,000 which is $852,182 for Equation TWO. (Study Session 3, LOS 13.a)

TOP

返回列表