Under what circumstances does the binomial distribution approximate a normal distribution?

  • Mean and variance of the binomial distribution
  • Normal approximation to the binimial distribution
  • One can easily verify that the mean for a single binomial trial, where S(uccess) is scored as 1 and F(ailure) is scored as 0, is p; where p is the probability of S. Hence the mean for the binomial distribution with n trials is np.
  • One can easily verify that the variance for a single binomial trial, where S is scored as 1 and F is scored as 0, is p(1-p). Hence the variance for the binomial distribution with n trials is np(1-p). This provides that the standard deviation is (np(1-p))^.5.
If the number of trials, n, is large, the binomial distribution is approximately equal to the normal distribution. (This is nice, since we really do not want to explicitly calculate binomial probabilities when n > 100.)

Example: If 10% of men are bald, what is the probability that fewer than 100 in a random sample of 818 men are bald?
Form the z-score, for which purpose it is necessary to have the mean (*mu*) and standars deviation (*sigma*)
*mu* = np = 818 × .1 = 81.8.
*sigma* = (np(1-p))^.5 = (818 × .1 × .9)^.5 = 8.5802
z = (n-*mu*)/*sigma* = (100-81.8)/8.58 = 2.12
Since we are interested in fewer than (draw a picture), from the normal table we find that 98.3% of the time there will be fewer than 100 bald men.

The validity of the normal approximation is illustrated if you click here.

Simulation with a binomial experiment is one way to generate a normal distribution.

N.B.: Either do all the calculations with count data as we have done here, or convert everything (including the standard deviation) to proportions.

Applets: The normal approximation to the binomial is illustrated by David Lane (this employs the continuity correction factor). A cruder version is also available. The classic falling ball model for the binomial convergence to the normal distribution can be seen at Davidson University or a .com (The classical model has each yellow ball going to the adjacent slot to the right or left with probability .5 when it hits a green ball, but these simulations look like more horizontal travel is possible).

Competencies: If n=25 and p=.2, calculate the mean, variance, and standard deviation of the binomial distribution.
If n=200 and p = .67, estimate the probability that the number of successes is greater than 140.

return to index

Questions?

Let \(X_i\) denote whether or not a randomly selected individual approves of the job the President is doing. More specifically:

  • Let \(X_i=1\), if the person approves of the job the President is doing, with probability \(p\)
  • Let \(X_i=0\), if the person does not approve of the job the President is doing with probability \(1-p\)

Then, recall that \(X_i\) is a Bernoulli random variable with mean:

\(\mu=E(X)=(0)(1-p)+(1)(p)=p\)

and variance:

\(\sigma^2=Var(X)=E[(X-p)^2]=(0-p)^2(1-p)+(1-p)^2(p)=p(1-p)[p+1-p]=p(1-p)\)

Now, take a random sample of \(n\) people, and let:

\(Y=X_1+X_2+\ldots+X_n\)

Then \(Y\) is a binomial(\(n, p\)) random variable, \(y=0, 1, 2, \ldots, n\), with mean:

\(\mu=np\)

and variance:

\(\sigma^2=np(1-p)\)

Now, let \(n=10\) and \(p=\frac{1}{2}\), so that \(Y\) is binomial(\(10, \frac{1}{2}\)). What is the probability that exactly five people approve of the job the President is doing?

Solution

There is really nothing new here. We can calculate the exact probability using the binomial table in the back of the book with \(n=10\) and \(p=\frac{1}{2}\). Doing so, we get:

\begin{align} P(Y=5)&= P(Y \leq 5)-P(Y \leq 4)\\ &= 0.6230-0.3770\\ &= 0.2460\\ \end{align}

That is, there is a 24.6% chance that exactly five of the ten people selected approve of the job the President is doing.

Note, however, that \(Y\) in the above example is defined as a sum of independent, identically distributed random variables. Therefore, as long as \(n\) is sufficiently large, we can use the Central Limit Theorem to calculate probabilities for \(Y\). Specifically, the Central Limit Theorem tells us that:

\(Z=\dfrac{Y-np}{\sqrt{np(1-p)}}\stackrel {d}{\longrightarrow} N(0,1)\).

Let's use the normal distribution then to approximate some probabilities for \(Y\). Again, what is the probability that exactly five people approve of the job the President is doing?

Solution

First, recognize in our case that the mean is:

\(\mu=np=10\left(\dfrac{1}{2}\right)=5\)

and the variance is:

\(\sigma^2=np(1-p)=10\left(\dfrac{1}{2}\right)\left(\dfrac{1}{2}\right)=2.5\)

Now, if we look at a graph of the binomial distribution with the rectangle corresponding to \(Y=5\) shaded in red:

02468100.00 0.050.0010.0100.0440.1170.2050.2460.100.150.200.25DensityHistogram of YNormalYMean - 5StDev - 1.581N - 1000

we should see that we would benefit from making some kind of correction for the fact that we are using a continuous distribution to approximate a discrete distribution. Specifically, it seems that the rectangle \(Y=5\) really includes any \(Y\) greater than 4.5 but less than 5.5. That is:

\(P(Y=5)=P(4.5< Y < 5.5)\)

Such an adjustment is called a "continuity correction." Once we've made the continuity correction, the calculation reduces to a normal probability calculation:

Now, recall that we previous used the binomial distribution to determine that the probability that \(Y=5\) is exactly 0.246. Here, we used the normal distribution to determine that the probability that \(Y=5\) is approximately 0.251. That's not too shabby of an approximation, in light of the fact that we are dealing with a relative small sample size of \(n=10\)!

Let's try a few more approximations. What is the probability that more than 7, but at most 9, of the ten people sampled approve of the job the President is doing?

Solution

If we look at a graph of the binomial distribution with the area corresponding to \(7<Y\le 9\) shaded in red:

0246 8100.000.050.0010.0100.0440.117 0.2050.2460.100.150.200.25DensityHistogram of Y NormalYMean - 5StDev - 1.581N - 1000

we should see that we'll want to make the following continuity correction:

\(P(7<Y \leq 9)=P(7.5< Y < 9.5)\)

Now again, once we've made the continuity correction, the calculation reduces to a normal probability calculation:

By the way, you might find it interesting to note that the approximate normal probability is quite close to the exact binomial probability. We showed that the approximate probability is 0.0549, whereas the following calculation shows that the exact probability (using the binomial table with \(n=10\) and \(p=\frac{1}{2}\) is 0.0537:

\(P(7<Y \leq 9)=P(Y\leq 9)-P(Y\leq 7)=0.9990-0.9453=0.0537\)

Let's try one more approximation. What is the probability that at least 2, but less than 4, of the ten people sampled approve of the job the President is doing?

Solution

If we look at a graph of the binomial distribution with the area corresponding to \(2\le Y<4\) shaded in red:

024 68100.000.050.0010.0100.0440.1170.2050.2460.100.150.200.25DensityHistogram of YNormalYMean - 5StDev - 1.581N - 1000

we should see that we'll want to make the following continuity correction:

\(P(2 \leq Y <4)=P(1.5< Y < 3.5)\)

Again, once we've made the continuity correction, the calculation reduces to a normal probability calculation:

\begin{align} P(2 \leq Y <4)=P(1.5< Y < 3.5) &= P(\dfrac{1.5-5}{\sqrt{2.5}}<Z<\dfrac{3.5-5}{\sqrt{2.5}})\\ &= P(-2.21<Z<-0.95)\\ &= P(Z>0.95)-P(Z>2.21)\\ &= 0.1711-0.0136=0.1575\\ \end{align}

By the way, the exact binomial probability is 0.1612, as the following calculation illustrates:

\(P(2 \leq Y <4)=P(Y\leq 3)-P(Y\leq 1)=0.1719-0.0107=0.1612\)

Just a couple of comments before we close our discussion of the normal approximation to the binomial.

(1) First, we have not yet discussed what "sufficiently large" means in terms of when it is appropriate to use the normal approximation to the binomial. The general rule of thumb is that the sample size \(n\) is "sufficiently large" if:

\(np\ge 5\) and \(n(1-p)\ge 5\)

For example, in the above example, in which \(p=0.5\), the two conditions are met if:

\(np=n(0.5)\ge 5\) and \(n(1-p)=n(0.5)\ge 5\)

Now, both conditions are true if:

\(n\ge 5\left(\frac{10}{5}\right)=10\)

Because our sample size was at least 10 (well, barely!), we now see why our approximations were quite close to the exact probabilities. In general, the farther \(p\) is away from 0.5, the larger the sample size \(n\) is needed. For example, suppose \(p=0.1\). Then, the two conditions are met if:

\(np=n(0.1)\ge 5\) and \(n(1-p)=n(0.9)\ge 5\)

Now, the first condition is met if:

\(n\ge 5(10)=50\)

And, the second condition is met if:

\(n\ge 5\left(\frac{10}{9}\right)=5.5\)

That is, the only way both conditions are met is if \(n\ge 50\). So, in summary, when \(p=0.5\), a sample size of \(n=10\) is sufficient. But, if \(p=0.1\), then we need a much larger sample size, namely \(n=50\).

(2) In truth, if you have the available tools, such as a binomial table or a statistical package, you'll probably want to calculate exact probabilities instead of approximate probabilities. Does that mean all of our discussion here is for naught? No, not at all! In reality, we'll most often use the Central Limit Theorem as applied to the sum of independent Bernoulli random variables to help us draw conclusions about a true population proportion \(p\). If we take the \(Z\) random variable that we've been dealing with above, and divide the numerator by \(n\) and the denominator by \(n\) (and thereby not changing the overall quantity), we get the following result:

\(Z=\dfrac{\sum X_i-np}{\sqrt{np(1-p)}}=\dfrac{\hat{p}-p}{\sqrt{\dfrac{p(1-p)}{n}}}\stackrel {d}{\longrightarrow} N(0,1)\)

The quantity:

\(\hat{p}=\dfrac{\sum\limits_{i=1}^n X_i}{n}\)

that appears in the numerator is the "sample proportion," that is, the proportion in the sample meeting the condition of interest (approving of the President's job, for example). In Stat 415, we'll use the sample proportion in conjunction with the above result to draw conclusions about the unknown population proportion p. You'll definitely be seeing much more of this in Stat 415!

Chủ đề