Statistics Help: Proportions, Probability, Distribution I am confused with these problems. Please show formulas used, so I can understand how...

Friday, January 6, 2017

Statistics Help: Proportions, Probability, Distribution I am confused with these problems. Please show formulas used, so I can understand how...

(a) "What is the smallest sample size we could take that would allow us to determine that the sampling distribution of `hat{p}` is approximately normal?"

This is actually somewhat subjective; by the Central Limit Theorem, the distribution of the sample proportion will approach normal as the sample size increases, but it will never be perfectly normal for any finite sample. All we can really do is make a "rule of thumb" for what constitutes a sufficient sample size. Different statisticians will give you different numbers, but `n gt= 30` is the one I tend to use. (Some say 31, others 20, others 40. It depends how much precision you need.) So let's say a sample of at least 30 students. I'll show in a moment what sort of margin of error that gives us.

(b) "A simple random sample of 40 students was taken. The proportion of students who work at paying jobs while going to school was found to be 0.80. What is the probability that the sample proportion would be at least 0.80?"

40 is bigger than 30, so we're good with assuming a normal distribution. Then we can get our mean and standard deviation based on that.

The mean estimate is just what we found:

`mu_{hat{p}} = 0.80`

The standard deviation is a function of the true proportion and the sample size:
`sigma_{hat{p}} = sqrt{frac{p(1-p)}{n}} = sqrt{(0.45)(0.55)/40} = sqrt{0.0061875} = .07866`

We're given that the true mean is 0.40; so the z-score relative to our true mean is pretty big:

`z = (0.80 - 0.45)/(0.079) = 4.43`

The probability can then be read off of a z table, and it's tiny; in fact, so tiny that it's hard to find tables that show it. `p lt 4.7*10^-6` is an upper bound; that's about 1 in 500000. In other words, it's very unlikely this result would happen simply by chance.

(c) "Suppose that a different sample of 40 was taken and this time the sample proportion was found to be 0.42. What is the probability that we would find a sample proportion that is less than 0.42?"

Same basic process.

`mu_{hat{p}} = 0.42`

`sigma_{hat{p}} = 0.086`

Our z-score is now a lot more manageable:
`z = (0.42 - 0.45)/(0.086) = -0.35`

It says "less than", so we're actually using the left tail of the distribution; but the left tail is the same as the right tail, so we can just flip the table around. The result is `p lt 0.363` . Not only could this result occur by chance, that's actually the most likely cause.

(d) "The sample proportions provided in parts (b) and (c) are very different, but can be explained very simply. Give a complete explanation using the correct statistical language and a complete, clear sentence (or sentences)."

Clearly the first sample is not representative of the population. It must be somehow biased. Maybe it was self-selected by the way we gathered our sample. It may be representative of some sub-population, such as students of a particular major or students of a particular socioeconomic status. But it's extremely unlikely that this is just a random sample from the same population of students for which the true proportion is 0.45.

Notes

Friday, January 6, 2017