返回列表 发帖

Reading 10: Sampling and Estimation-LOS k 习题精选

Session 3: Quantitative Methods: Application
Reading 10: Sampling and Estimation

LOS k: Discuss the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.

 

 

When sampling from a population, the most appropriate sample size:

A)
minimizes the sampling error and the standard deviation of the sample statistic around its population value.
B)
is at least 30.
C)
involves a trade-off between the cost of increasing the sample size and the value of increasing the precision of the estimates.


 

A larger sample reduces the sampling error and the standard deviation of the sample statistic around its population value. However, this does not imply that the sample should be as large as possible, or that the sampling error must be as small as can be achieved. Larger samples might contain observations that come from a different population, in which case they would not necessarily improve the estimates of the population parameters. Cost also increases with the sample size. When the cost of increasing the sample size is greater than the value of the extra precision gained, increasing the sample size is not appropriate.

An analyst has compiled stock returns for the first 10 days of the year for a sample of firms and estimated the correlation between these returns and changes in book value for these firms over the just ended year. What objection could be raised to such a correlation being used as a trading strategy?

A)
Use of year-end values causes a time-period bias.
B)
Use of year-end values causes a sample selection bias.
C)
The study suffers from look-ahead bias.


The study suffers from look-ahead bias because traders at the beginning of the year would not be able to know the book value changes. Financial statements usually take 60 to 90 days to be completed and released.

TOP

 

Sunil Hameed is a reporter with the weekly periodical The Fun Finance Times. Today, he is scheduled to interview a researcher who claims to have developed a successful technical trading strategy based on trading on the CEO’s birthday (sample was taken from the Fortune 500). After the interview, Hameed summarizes his notes (partial transcript as follows). The researcher:

  • was defensive about the lack of economic theory consistent with his results.

  • used the same database of data for all his tests and has not tested the trading rule on out-of-sample data.

  • excluded stocks for which he could not determine the CEO’s birthday.

  • used a sample cut-off date of the month before the latest market correction.

Select the choice that best completes the following: Hameed concludes that the research is flawed because the data and process are biased by:

A)
data mining, time-period bias, and look-ahead bias.
B)
data mining, sample selection bias, and time-period bias.
C)
sample selection bias and time-period bias.


Evidence that the researcher used data mining is that he was defensive about the lack of economic theory consistent with his results and that he used the same database of data for all his tests. One way to avoid data mining is to test the trading rule on out-of-sample data.  Sample selection bias occurs when some data is systematically excluded from the analysis, usually because it is not available. Here, the researcher excluded stocks for which he could not determine the CEO’s birthday. Time-period bias can result if the time period is too short or too long. Here, it is likely that the period was too short since the researcher used a cut-off date of the month before the latest market correction. Note: this could be an additional example of data mining.

We are not given enough information to determine if the researcher is guilty of look-ahead bias (which occurs when the analyst uses historical data that was not publicly available at the time being studied).

TOP

An analyst has reviewed market data for returns from 1980–1990 extensively, searching for patterns in the returns. She has found that when the end of the month falls on a Saturday, there are usually positive returns on the following Thursday. She has engaged in:

A)
data snooping.
B)
data mining.
C)
biased selection.


Data mining refers to the extensive review of the same database searching for patterns.

TOP

Which of the following is the best method to avoid data mining bias when testing a profitable trading strategy?

A)
Increase the sample size to at least 30 observations per year.
B)
Use a sample free of survivorship bias.
C)
Test the strategy on a different data set than the one used to develop the rules.


The best way to avoid data mining is to test a potentially profitable trading rule on a data set different than the one you used to develop the rule (out-of-sample data). A larger sample size won’t prevent data mining, and you can still data mine a database free of survivorship bias.

TOP

A scientist working for a pharmaceutical company tries many models using the same data before reporting the one that shows that the given drug has no serious side effects. The scientist is guilty of:

A)
look-ahead bias.
B)
sample selection bias.
C)
data mining.


Data mining is the process where the same data is used with different methods until the desired results are obtained.

TOP

The practice of repeatedly using the same database to search for patterns until one is found is called:

A)
data snooping.
B)
sample selection bias.
C)
data mining.


The practice of data mining involves analyzing the same data so as to detect a pattern, which may not replicate in other data sets, also known as torturing the data until it confesses.

TOP

A research paper that reports finding a profitable trading strategy without providing any discussion of an economic theory that makes predictions consistent with the empirical results is most likely evidence of:

A)
data mining.
B)
a sample that is not large enough.
C)
a non-normal population distribution.


Data mining occurs when the analyst continually uses the same database to search for patterns or trading rules until he finds one that works. If you are reading research that suggests a profitable trading strategy, make sure you heed the following warning signs of data mining:

Evidence that the author used many variables (most unreported) until he found ones that were significant.

The lack of any economic theory that is consistent with the empirical results.

TOP

The average mutual fund return calculated from a sample of funds with significant survivorship bias would most likely be:

A)
an unbiased estimate of the mean return of the population of all mutual funds if the sample size was large enough.
B)
larger than the mean return of the population of all mutual funds.
C)
smaller than the mean return of the population of all mutual funds.


If we try to draw any conclusions from an analysis of a mutual fund database with survivorship bias, we overestimate the average mutual fund return, because we don’t include the poorer-performing funds that dropped out. A larger sample size from a database with survivorship bias will still result in a biased estimate.

TOP

A study reports that from 2002 to 2004 the average return on growth stocks was twice as large as that of value stocks. These results most likely reflect:

A)
look-ahead bias.
B)
survivorship bias.
C)
time-period bias.


Time-period bias can result if the time period over which the data is gathered is either too short because the results may reflect phenomenon specific to that time period, or if a change occurred during the time frame that would result in two different return distributions. In this case the time period sampled is probably not large enough to draw any conclusions about the long-term relative performance of value and growth stocks, even if the sample size within that time period is large.

Look-ahead bias occurs when the analyst uses historical data that was not publicly available at the time being studied. Survivorship bias is a form of sample selection bias in which the observations in the sample are biased because the elements of the sample that survived until the sample was taken are different than the elements that dropped out of the population.

TOP

返回列表