Random Variables¶

Charles Boncelet, boncelet@udel.edu, 24 Feb 2014. Basics of PMF, CDF, and density.

Some of these examples are taken from the Scipy ipython notebook by

"J.R. Johansson (robert@riken.jp) http://dml.riken.jp/~rob/ The latest version of this IPython notebook lecture is available at http://github.com/jrjohansson/scientific-python-lectures. The other notebooks in this lecture series are indexed at http://jrjohansson.github.com."

%pylab inline

Populating the interactive namespace from numpy and matplotlib

from scipy import stats

Discrete Random Variables¶

# create a (discrete) random variable with the Poisson distribution

X = stats.poisson(3.5)

X is a Poisson random variable. Let's plot it's PMF and CDF. We use a matplotlib property to plot three different plots using the same x-axis. Notice how the histogram approximates the PMF.

In the plots below, we take care to center the bars on the PMF and the histogram on the integers, i.e, the bars go from k-0.5 to k+0.5, but the jumps in the CDF are on the integers, i.e., the jump is at k. Also, we specify the bins for the histogram. The default number, 10 bins, can easily result in aliasing errors.

X.pmf(arange(0,15))

array([  3.01973834e-02,   1.05690842e-01,   1.84958973e-01,
         2.15785469e-01,   1.88812285e-01,   1.32168600e-01,
         7.70983499e-02,   3.85491749e-02,   1.68652640e-02,
         6.55871379e-03,   2.29554983e-03,   7.30402218e-04,
         2.13033980e-04,   5.73553024e-05,   1.43388256e-05])

n = arange(0,15)

fig, axes = subplots(3,1, sharex=True)

# plot the probability mass function (PMF)
# the 'where' keyword is optional, but 'mid' centers the plateaus on the integers
axes[0].step(n, X.pmf(n), where='mid')

# plot the commulative distribution function (CDF)
# don't shift the CDF.  The jumps should occur at the integers
axes[1].step(n, X.cdf(n))

# plot histogram of 1000 random realizations of the stochastic variable X
# shifting the bins by 0.5 centers each bar on the integer values
axes[2].hist(X.rvs(size=1000), bins=arange(-0.5,15.5));

An alternative plot is a "stem" plot. The two plots look ugly together. Best to pick one. (I'm not sure how to fix the "step" plot so the first bar is full sized, not half sized as it is now.)

stem(n, X.pmf(n),'r')
step(n, X.pmf(n), where='mid')

[<matplotlib.lines.Line2D at 0x10bd3b7d0>]

X.mean(), X.std(), X.var() # poission distribution

(3.5, 1.8708286933869707, 3.5)

Let's try a binomial random variable.

B = stats.binom(n=10,p=0.4)

n = arange(0,11)

fig, axes = subplots(3,1, sharex=True)

# plot the probability mass function (PMF)
#axes[0].step(n, B.pmf(n), where='mid')
axes[0].stem(n, B.pmf(n))

# plot the commulative distribution function (CDF)
axes[1].step(n, B.cdf(n))

# plot histogram of 1000 random realizations of the stochastic variable X
axes[2].hist(B.rvs(size=1000), bins=arange(-0.5,11.5));

bs = B.rvs(size=100)
bs

array([3, 3, 5, 5, 7, 5, 5, 4, 5, 5, 4, 2, 4, 7, 4, 5, 5, 3, 6, 7, 4, 2, 5,
       3, 6, 5, 5, 5, 5, 6, 1, 4, 3, 5, 2, 5, 2, 4, 3, 3, 5, 6, 4, 4, 5, 4,
       5, 4, 2, 6, 5, 7, 3, 4, 5, 6, 4, 8, 4, 3, 6, 5, 3, 3, 4, 3, 4, 4, 5,
       5, 2, 7, 4, 7, 2, 5, 5, 6, 5, 3, 3, 5, 3, 6, 5, 4, 4, 2, 1, 5, 2, 3,
       4, 1, 3, 5, 3, 7, 3, 3])

hist(bs,bins=arange(-0.5,11.5))

(array([  0.,   3.,   9.,  20.,  21.,  30.,   9.,   7.,   1.,   0.,   0.]),
 array([ -0.5,   0.5,   1.5,   2.5,   3.5,   4.5,   5.5,   6.5,   7.5,
         8.5,   9.5,  10.5]),
 <a list of 11 Patch objects>)

Notice what happens when we don't specify the bins and use the default, 10 bins. There's an aliasing problem. The two histogram plots below are misleading.

hist(bs)

(array([  3.,   9.,  20.,   0.,  21.,  30.,   0.,   9.,   7.,   1.]),
 array([ 1. ,  1.7,  2.4,  3.1,  3.8,  4.5,  5.2,  5.9,  6.6,  7.3,  8. ]),
 <a list of 10 Patch objects>)

Here's another misleading histogram. This time there are too few bins.

hist(bs,bins=5)

(array([ 12.,  20.,  51.,   9.,   8.]),
 array([ 1. ,  2.4,  3.8,  5.2,  6.6,  8. ]),
 <a list of 5 Patch objects>)

Continuous Random Variables

# create a (continous) random variable with normal distribution
Z = stats.norm()

x = linspace(-4,4,100)

fig, axes = subplots(3,1, sharex=True)

# plot the probability distribution function (PDF)
axes[0].plot(x, Z.pdf(x))

# plot the commulative distributin function (CDF)
axes[1].plot(x, Z.cdf(x));

# plot histogram of 1000 random realizations of the stochastic variable Y
axes[2].hist(Z.rvs(size=1000), bins=50);

Z.mean(), Z.std(), Z.var() # normal distribution

(0.0, 1.0, 1.0)

#help(hist)