Notes.nb

The Central Limit Theorem

There are two theorems which have proven to be very helpful in understanding the properties of the sample mean and variance: the Central Limit Theorem and (in special, but not uncommon, cases) the Reproductive Theorem.

The Reproductive Theorem (not so named by Devore) states that a linear combination of normally distributed random variables has a normal distribution, and in particular if a collection of n independent and identically distributed random variables each have a normal distribution with mean μ and variance , then
    1.  the sample mean has a normal distribution with mean μ and variance
    2.  the sample total has a normal distribution with mean nμ and variance n

The Central Limit Theorem states that if a collection of n (where n is large) independent and identically distributed random variables each have a distribution with mean μ and variance then
    1.  the sample mean has an approximately normal distribution with mean μ and variance
    2.  the sample total has an approximately normal distribution with mean nμ and variance n

These two theorems are significantly different from a theoretical standpoint. But from a practical standpoint, they allow us to operate effectively on estimates of the mean and standard deviation in many practically interesting cases.

It should be noted that Devore bases his discussion of these theorems on definitions (e.g. linear combination, random sample, statistic, and so on) that we do not cover in this course because of how we have chosen to approach the subject matter. While some exactitude in terminology is lost, a very brief explanation of the concepts is adequate to proceed with the relevant discussions.

Application of the Central Limit Theorem to Pseudorandom Numbers

In the discussion of pseudorandom number generators above, we commented that one of the simplest methods of generating approximately normally distributed random numbers was through the use of the Central Limit Theorem. Some thought about the Central Limit Theorem (specifically the properties of the sample total) suggests that

((Underoverscript[∑, i = 1, arg3] U_i) - 0.5n)/n/12 ^(1/2) = 12/n^(1/2) ((Underoverscript[∑, i = 1, arg3] U_i) - 0.5n)

will have the desired properties for "large" n (assuming has a Uniform[0,1] distribution).

In[13]:=

Out[13]=

First, we construct a function that will return a random number by using the rule discussed above:

In[25]:=

$RandomNormal[n_] := 12/n^(1/2) (Sum[Random[], {i, 1, n}] - 0.5n)$

In practice, the number of uniform random variables summed would be chosen by the programmer, not left as an input parameter. But defining the function this way allows us to perform interesting experiments about how good the Normal approximation is.

Now let us generate a suitably large sample and view its properties

In[51]:=

Out[52]=

Out[53]=

Out[54]=

Out[55]=

Out[56]=

While it is possible to construct a histogram of this data in Mathematica, the details are rather messy; Microsoft Excel is somewhat easier to use in this context.

As a PNG, this function leaves something to be desired, primarily in two respects: it is inefficient (takes a lot of computer time for each call), and its tails do not extend far enough out (a difference that will be only significant very occaisionally). But it does illustrate an application of the Central Limit Theorem.

In[57]:=

Out[57]=

Created by Mathematica (July 20, 2006)