返回列表 发帖

Sampling Error vs Standard Error of the Sample Mean

Just wanted to make sure I understand and can differentiate between these two concepts:

For say a sample/population mean, I understand sampling error to the difference between the sample mean and the population mean. Essentially, its the difference that results in inherent differences between the sample and population.

For a standard error of the sample mean, is this referring to the standard deviation of the sample mean (ie. with x% confidence and the standard error, you can reject the null hypothesis and state the sample mean is representative of the population?)

thanks for your help.

Sampling error is a type of error that comes from the fact that you have a sample rather than the entire population. So if I have 100,000 test takers taking the exam on Saturday and that's the whole population, I can average their heights and get the population mean, which is an exact number representing the average or typical candidate's height. In practice, we almost never get to take the "true average" because it would be way too much work to measure the entire population.

So, instead, we take a random sample of 2000 test takers, rather than all 100k of them. This is more doable. If the sample is chosen randomly, then the EXPECTED average of the sample is the same as the true average of the population. However, since it is a random sample, it could be a little bit different, because each sample leaves out some people and we don't know ahead of time which ones they are. These differences between the sample's mean and the "true mean" are "sampling error" i.e. the error (in using the sample mean as an estimate of the true mean) that comes from the fact that you've chosen a random sample from the population, rather than surveyed the entire population itself.

Sample error is an important concept for two reasons: 1) in order to distinguish itself from other kinds of error or biases, like sampling bias (we'll get a biased result if our sample of 2000 test-takers comes exclusively from Germany, where people are typically taller than average, as opposed to globally; remember I'm trying to get average height, not average test score).

2) the common statistics such as standard error and confidence intervals are ONLY measure of the effect of sampling error. If you have other biases in your sampling technique, then the standard errors of your estimates won't capture that, and so you can become overconfident of your statistical tests, which usually means risking too much money. Also, building confidence intervals assumes that the ONLY source of error is sampling error.


For standard error: standard error is essentially the standard deviation of sample means around the population mean. In other words. If you repeated your analysis 1000 times, choosing a new random sample every time, and plotted each mean on a histogram, you'd get something that looks like a normal distribution with a mean equal to the population's mean and a standard deviation equal to the standard error.

Most people don't want to take 1000 new samples and plot the histogram, since the whole point of sampling is to reduce the work, so they use a short cut: they say, I'm going to assume that the SD of this sample (remember that SE = SD / SqRT(N) ) is representative of the SD of the population, and then I'll just build a confidence interval around my sample mean, and +/- 2*SE should give me a range that it 95% likely to contain the true mean.

TOP

@bchadwick

great explanation .

TOP

thank you so much for clarifying that post bchadwick! you really couldn't have explained it better. thank you again for taking the time to answer my question so thoroughly!!

TOP

wow, amazing explanation, thank you

TOP

返回列表