Chapter 6 Sampling DistributionsA statistic, such as the sample mean or the sample standard deviation, is a number computed from a sample. Since a sample is random, every statistic is a random variable: it varies from sample to sample in a way that cannot be predicted with certainty. As a random variable it has a mean, a standard deviation, and a probability distribution. The probability distribution of a statistic is called its sampling distributionThe probability distribution of a sample statistic when the statistic is viewed as a random variable.. Typically sample statistics are not ends in themselves, but are computed in order to estimate the corresponding population parameters, as illustrated in the grand picture of statistics presented in Figure 1.1 "The Grand Picture of Statistics" in Chapter 1 "Introduction". Show
This chapter introduces the concepts of the mean, the standard deviation, and the sampling distribution of a sample statistic, with an emphasis on the sample mean x-. 6.1 The Mean and Standard Deviation of the Sample MeanLearning Objectives
Suppose we wish to estimate the mean μ of a population. In actual practice we would typically take just one sample. Imagine however that we take sample after sample, all of the same size n, and compute the sample mean x- of each one. We will likely get a different value of x- each time. The sample mean x- is a random variable: it varies from sample to sample in a way that cannot be predicted with certainty. We will write X- when the sample mean is thought of as a random variable, and write x- for the values that it takes. The random variable X- has a meanThe number about which means computed from samples of the same size center., denoted μX-, and a standard deviationA measure of the variability of means computed from samples of the same size., denoted σX-. Here is an example with such a small population and small sample size that we can actually write down every single sample. Example 1A rowing team consists of four rowers who weigh 152, 156, 160, and 164 pounds. Find all possible random samples with replacement of size two and compute the sample mean for each one. Use them to find the probability distribution, the mean, and the standard deviation of the sample mean X-. Solution The following table shows all possible samples with replacement of size two, along with the mean of each:
The table shows that there are seven possible values of the sample mean X-. The value x-=152 happens only one way (the rower weighing 152 pounds must be selected both times), as does the value x-=164, but the other values happen more than one way, hence are more likely to be observed than 152 and 164 are. Since the 16 samples are equally likely, we obtain the probability distribution of the sample mean just by counting: x-152 154156158160162164P(x-)1162 16316416316216116 Now we apply the formulas from Section 4.2.2 "The Mean and Standard Deviation of a Discrete Random Variable" in Chapter 4 "Discrete Random Variables" for the mean and standard deviation of a discrete random variable to X-. For μX- we obtain. μX-=Σx-P(x-)=152(116)+154 (216)+156(316)+158(416)+160(316)+162( 216)+164(116)=158 For σX- we first compute Σx-2 P(x-): 1522(116)+1542 (216)+1562(316)+1582(416)+1602(3 16)+1622(216)+1642(116) which is 24,974, so that σX-=Σx-2P(x-)−μx-2=24,974−1582=10 The mean and standard deviation of the population {152,156,160,164} in the example are μ = 158 and σ=20. The mean of the sample mean X- that we have just computed is exactly the mean of the population. The standard deviation of the sample mean X- that we have just computed is the standard deviation of the population divided by the square root of the sample size: 10=20/2. These relationships are not coincidences, but are illustrations of the following formulas. Suppose random samples of size n are drawn from a population with mean μ and standard deviation σ. The mean μX- and standard deviation σ X- of the sample mean X- satisfy μX-=μandσX-=σn The first formula says that if we could take every possible sample from the population and compute the corresponding sample mean, then those numbers would center at the number we wish to estimate, the population mean μ. The second formula says that averages computed from samples vary less than individual measurements on the population do, and quantifies the relationship. Example 2The mean and standard deviation of the tax value of all vehicles registered in a certain state are μ=$13,525 and σ =$4,180. Suppose random samples of size 100 are drawn from the population of vehicles. What are the mean μX- and standard deviation σX- of the sample mean X- ? Solution Since n = 100, the formulas yield μX-=μ=$13,525andσX-=σn=$4180100=$418 Key Takeaways
Exercises
Answers
6.2 The Sampling Distribution of the Sample MeanLearning Objectives
The Central Limit TheoremIn Note 6.5 "Example 1" in Section 6.1 "The Mean and Standard Deviation of the Sample Mean" we constructed the probability distribution of the sample mean for samples of size two drawn from the population of four rowers. The probability distribution is: x-152154156158 160162164P(x-)116216316 416316216116 Figure 6.1 "Distribution of a Population and a Sample Mean" shows a side-by-side comparison of a histogram for the original population and a histogram for this distribution. Whereas the distribution of the population is uniform, the sampling distribution of the mean has a shape approaching the shape of the familiar bell curve. This phenomenon of the sampling distribution of the mean taking on a bell shape even though the population distribution is not bell-shaped happens in general. Here is a somewhat more realistic example. Figure 6.1 Distribution of a Population and a Sample Mean Suppose we take samples of size 1, 5, 10, or 20 from a population that consists entirely of the numbers 0 and 1, half the population 0, half 1, so that the population mean is 0.5. The sampling distributions are: n = 1: x -01P(x-)0.50.5 n = 5: x-00.20.40.60.81P(x-)0.030.160.310.310.160.03 n = 10: x-00.10.20.3 0.40.50.60.70.80.91P(x-)0.000.01 0.040.120.210.250.210.120.040.010.00 n = 20: x-00.050.100.150.200.25 0.300.350.400.450.50P(x-)0.000.000.00 0.000.000.010.040.070.120.160.18 x-0.550.600.650.700.750.800.850.90 0.951P(x-)0.160.120.070.040.010.000.00 0.000.000.00 Histograms illustrating these distributions are shown in Figure 6.2 "Distributions of the Sample Mean". Figure 6.2 Distributions of the Sample Mean As n increases the sampling distribution of X- evolves in an interesting way: the probabilities on the lower and the upper ends shrink and the probabilities in the middle become larger in relation to them. If we were to continue to increase n then the shape of the sampling distribution would become smoother and more bell-shaped. What we are seeing in these examples does not depend on the particular population distributions involved. In general, one may start with any distribution and the sampling distribution of the sample mean will increasingly resemble the bell-shaped normal curve as the sample size increases. This is the content of the Central Limit Theorem. The Central Limit TheoremFor samples of size 30 or more, the sample mean is approximately normally distributed, with mean μX-=μ and standard deviation σX-= σ/n, where n is the sample size. The larger the sample size, the better the approximation. The Central Limit Theorem is illustrated for several common population distributions in Figure 6.3 "Distribution of Populations and Sample Means". Figure 6.3 Distribution of Populations and Sample Means The dashed vertical lines in the figures locate the population mean. Regardless of the distribution of the population, as the sample size is increased the shape of the sampling distribution of the sample mean becomes increasingly bell-shaped, centered on the population mean. Typically by the time the sample size is 30 the distribution of the sample mean is practically the same as a normal distribution. The importance of the Central Limit Theorem is that it allows us to make probability statements about the sample mean, specifically in relation to its value in comparison to the population mean, as we will see in the examples. But to use the result properly we must first realize that there are two separate random variables (and therefore two probability distributions) at play:
Example 3Let X- be the mean of a random sample of size 50 drawn from a population with mean 112 and standard deviation 40.
Solution
Note that if in Note 6.11 "Example 3" we had been asked to compute the probability that the value of a single randomly selected element of the population exceeds 113, that is, to compute the number P(X > 113), we would not have been able to do so, since we do not know the distribution of X, but only that its mean is 112 and its standard deviation is 40. By contrast we could compute P(X->113) even without complete knowledge of the distribution of X because the Central Limit Theorem guarantees that X- is approximately normal. Example 4The numerical population of grade point averages at a college has mean 2.61 and standard deviation 0.5. If a random sample of size 100 is taken from the population, what is the probability that the sample mean will be between 2.51 and 2.71? Solution The sample mean X- has mean μX-=μ=2.61 and standard deviation σX- =σ/n=0.5/10=0.05, so P(2.51<X-<2.71)=P(2.51−μX-σX- <Z<2.71−μX-σX-)=P(2.51−2.610.05<Z<2.71−2.610.05)=P( −2<Z<2)=P(Z<2)−P(Z<−2) =0.9772−0.0228=0.9544 Normally Distributed PopulationsThe Central Limit Theorem says that no matter what the distribution of the population is, as long as the sample is “large,” meaning of size 30 or more, the sample mean is approximately normally distributed. If the population is normal to begin with then the sample mean also has a normal distribution, regardless of the sample size. For samples of any size drawn from a normally distributed population, the sample mean is normally distributed, with mean μX-=μ and standard deviation σX-=σ/ n, where n is the sample size. The effect of increasing the sample size is shown in Figure 6.4 "Distribution of Sample Means for a Normal Population". Figure 6.4 Distribution of Sample Means for a Normal Population Example 5A prototype automotive tire has a design life of 38,500 miles with a standard deviation of 2,500 miles. Five such tires are manufactured and tested. On the assumption that the actual population mean is 38,500 miles and the actual population standard deviation is 2,500 miles, find the probability that the sample mean will be less than 36,000 miles. Assume that the distribution of lifetimes of such tires is normal. Solution For simplicity we use units of thousands of miles. Then the sample mean X- has mean μX-=μ=38.5 and standard deviation σX-=σ/n=2.5/5=1.11803. Since the population is normally distributed, so is X-, hence P(X- <36)=P(Z<36−μX-σX- )=P(Z<36−38.51.11803)=P(Z< −2.24)=0.0125 That is, if the tires perform as designed, there is only about a 1.25% chance that the average of a sample of this size would be so low. Example 6An automobile battery manufacturer claims that its midgrade battery has a mean life of 50 months with a standard deviation of 6 months. Suppose the distribution of battery lives of this particular brand is approximately normal.
Solution
Key Takeaways
Exercises
Basic
Applications
Additional ExercisesAnswers
6.3 The Sample ProportionLearning Objectives
Often sampling is done in order to estimate the proportion of a population that has a specific characteristic, such as the proportion of all items coming off an assembly line that are defective or the proportion of all people entering a retail store who make a purchase before leaving. The population proportion is denoted p and the sample proportion is denoted p^. Thus if in reality 43% of people entering a store make a purchase before leaving, p = 0.43; if in a sample of 200 people entering the store, 78 make a purchase, p^=78/200= 0.39. The sample proportion is a random variable: it varies from sample to sample in a way that cannot be predicted with certainty. Viewed as a random variable it will be written P^. It has a meanThe number about which proportions computed from samples of the same size center. μP^ and a standard deviationA measure of the variability of proportions computed from samples of the same size. σP^. Here are formulas for their values. Suppose random samples of size n are drawn from a population in which the proportion with a characteristic of interest is p. The mean μP^ and standard deviation σP^ of the sample proportion P^ satisfy μP^=pandσP^=pqn where q=1−p. The Central Limit Theorem has an analogue for the population proportion P^. To see how, imagine that every element of the population that has the characteristic of interest is labeled with a 1, and that every element that does not is labeled with a 0. This gives a numerical population consisting entirely of zeros and ones. Clearly the proportion of the population with the special characteristic is the proportion of the numerical population that are ones; in symbols, p=number of1sN But of course the sum of all the zeros and ones is simply the number of ones, so the mean μ of the numerical population is μ=Σx N=numberof1sN Thus the population proportion p is the same as the mean μ of the corresponding population of zeros and ones. In the same way the sample proportion p^ is the same as the sample mean x-. Thus the Central Limit Theorem applies to P^. However, the condition that the sample be large is a little more complicated than just being of size at least 30. The Sampling Distribution of the Sample ProportionFor large samples, the sample proportion is approximately normally distributed, with mean μP^=p and standard deviation σP^ =pq/n. A sample is large if the interval [p−3σP^, p+3σP^] lies wholly within the interval [0,1]. In actual practice p is not known, hence neither is σP^. In that case in order to check that the sample is sufficiently large we substitute the known quantity p^ for p. This means checking that the interval [p^−3p ^(1−p^)n,p^+3p^(1−p^)n] lies wholly within the interval [0,1]. This is illustrated in the examples. Figure 6.5 "Distribution of Sample Proportions" shows that when p = 0.1 a sample of size 15 is too small but a sample of size 100 is acceptable. Figure 6.6 "Distribution of Sample Proportions for " shows that when p = 0.5 a sample of size 15 is acceptable. Figure 6.5 Distribution of Sample Proportions Figure 6.6 Distribution of Sample Proportions for p = 0.5 and n = 15 Example 7Suppose that in a population of voters in a certain region 38% are in favor of particular bond issue. Nine hundred randomly selected voters are asked if they favor the bond issue.
Solution
Example 8An online retailer claims that 90% of all orders are shipped within 12 hours of being received. A consumer group placed 121 orders of different sizes and at different times of day; 102 orders were shipped within 12 hours.
Solution
Key Takeaways
Exercises
Basic
Applications
Additional ExercisesAnswers
How do you find the standard deviation of a sampling distribution?The standard deviation of the sampling distribution of means equals the standard deviation of the population divided by the square root of the sample size.
How do you calculate the sample standard deviation?Here's how to calculate sample standard deviation:. Step 1: Calculate the mean of the data—this is xˉx, with, \bar, on top in the formula.. Step 2: Subtract the mean from each data point. ... . Step 3: Square each deviation to make it positive.. Step 4: Add the squared deviations together.. What is the standard deviation of the sampling distribution of its means?The standard deviation of the sampling distribution of the mean is called the standard error of the mean. It is designated by the symbol: σM . Note that the spread of the sampling distribution of the mean decreases as the sample size increases.
|