SSIS Notebook Tutorial

SSIS is a steganography technique developed by Lisa Marvel (my graduate student and scientist with the US Army Research Laboratory), Charles Retter (also ARL), and myself in the late 1990's and early 2000's. We published several papers describing the technique. We also were awarded a US Patent. Those papers and the patent have been cited over a thousand times.

The notebook, [html] and [ipynb], presents the basics of SSIS.

References: [TrIP99] [ICIP98]

Block Arithmetic Coding Tutorial

Block Arithmetic Coding Tutorial

I've written two Jupyter notebooks describing my Block Arithmetic Coding (BAC). BAC is a variable to fixed encoder (parse the input into variable length strings and encode each with a fixed length output string). I consider it the best work I've done, but hardly anyone read the paper. Hopefully, some people will at least read my tutorial.

The first notebook goes over the basics of BAC and shows how to compute it's coding efficiency. [ipynb] [html]

The second notebook provides a simple encoder and decoder for binary inputs. [ipynb] [html]

My Book is Out and Website Update

My Book is Out and Website Update

My book, Probability, Statistics, and Random Signals, came out in 2016. I was busy doing lots of things last spring, summer, fall, and this winter (visiting China, doing a sabbatical in Idaho, and taking a study abroad trip to London) and I'm finally getting around to updating my website.

It is available through my publisher, Oxford University Press, and through Amazon.com.

My book is designed for a junior or senior year class, but many reviewers suggested they may use it in their graduate classes. It has lots of carefully drawn illustrations, computational examples using Matlab, Python, and R, and many worked out examples.

The first nine chapters cover discrete and continuous probability. The remaining five chapters cover statistics, hypothesis testing, random vectors and linear regression, and random processes.

Contact me if you are interested in using the text in your class or your studies. The Solutions Manual has solutions for every homework problem (writing solutions to every problem took me much longer than I anticipated!).

Oxford University Press link to Probability, Statistics, and Random Signals

Amazon link to Probability, Statistics, and Random Signals

Book Cover

Solar Insolation

I read Steve Goddard's blog, Real Science. He is doing a wonderful job exposing the fraudulent manipulation of the temperature record by the climate alarmist crowd (e.g., NASA).

But lately, he has gotten off-track talking about climate physics. I sent the following comment to one of his posts:

There’s way too much imprecision in this discussion–from all sides.

I used the solar calculator at http://www.esrl.noaa.gov/gmd/grad/solcalc/ to record the sun’s elevation throughout the day. I chose Jun 21, 2014, the summer solstice and recorded the sun’s elevation hourly for three locations: the North Pole, Anchorage Alaska, and a point on the equator.

At the north pole, the sun’s elevation is about 23.5 degrees all day. The sun at Anchorage reaches an elevation of about 52 degrees. The sun at the equator reaches an elevation of 66.6 degrees.

Solar insolation is proportional to the cos(90-elevation angle) for angles above the horizon. Integrating over the 24 hours, I get the following: the north pole sees 9.5 hours direct sun equivalent, Anchorage sees 8.7 hours, and the equator 7.0 hours.

So Steve/Tony is right: at the summer solstice, the Arctic gets more solar insolation than the tropics.

However, this statement is misleading in many ways. I’ve neglected the effect of the lower sun’s rays at the pole being absorbed more by the atmosphere than at the equator. The peak solar elevation is much higher at the equator than the pole.

Furthermore, the summer solstice is the most favorable day of the year for the north pole. That’s the day it gets most solar insolation and the day the equator gets least (though the equator doesn’t vary much). (BTW, the south pole gets no insolation that day.)

Finally, a day’s solar insolation affects that location’s change in temperature far more than it affects it’s absolute temperature. The tropics are warm all year around, but the pole is very cold coming out of winter.

Blaming the cold weather at the poles on the lack of greenhouse gases is wrong. (BTW, the standard assumption is that CO2 is distributed uniformly.) The poles are cold because they get less sunlight over the year.

Water is immensely important to the planet’s temperature distribution. The oceans transfer heat from the tropics poleward, clouds and thunderstorms cool the tropics, and clouds help retain heat at night. The atmosphere also contributes with convection and winds. It’s complicated and oversimplifying is misleading at best.

Bloom Energy Subsidy

I sent the following letter to the News Journal:

A recent report in the News Journal indicates that Delmarva Power customers are subsidizing Bloom Energy in excess of $3 million per month. That is about $40 million per year.

I have a couple of questions for the governor and legislature:

1) If Bloom actually creates 1000 jobs (likely an optimistic scenario), the subsidy works out to $40,000 per job per year. Are these jobs actually worth $40,000 each to the state of Delaware? If not, why are we paying this subsidy?

2) If these jobs are important to the State of Delaware, why is it fair that only the Delmarva Power customers have to pay the subsidy, why not all the citizens in Delaware?

New IPython Notebooks: Getting Started and Linear Regression

I added two more IPython Notebooks.

The first is a Getting Started notebook with lots of advice on where to get python (anaconda or canopy), how to run IPython remotely (wakari.io and sagemathcloud.com), and where to find tutorials.

The second is an example showing four different ways to compute linear regression estimates with Python. Bottom line: for most problems, use statsmodels with the QR option.

See my IPython page for more information and to get the files.

Sampling Theorem -- Aliasing

The sampling theorem consists of two parts: the first is sampling and aliasing and the second is reconstruction. Taken together, the sampling theorem is the most important concept in digital signal processing: it is why we can use digital computers to analyze continuous time signals.

I wrote up a quick IPython script demonstrating aliasing. Basically, we have a series of plots showing two continuous time sinusoids and the resulting samples. The digital frequencies are the same.

In later work, we will demonstrate reconstruction.

Here are the files: IPynb, HTML

Ryan Howard and clutch hitting

It's been a frustrating couple of years for this Phillies fan. The Phillies management (mostly the GM Amaro) has made many mistakes, but none bigger than signing Ryan Howard to a huge contract.

He can't hit for average, can't run, can't throw, and can't field (in fairness, he's okay at catching throws in the dirt, but has no range and doesn't field enough batted balls).

But, people will say, he hits for power and gets RBI's. Let's look at the numbers:

As of today (17 August 2014), Howard is in a three-way tie for 14th in the NL with 18 homeruns through 117 games. He's on pace to hit 25 = (162/117)*18 homeruns on the season.

Is that enough to compensate for all his other failings? Not in my book.

How about RBI's? Howard is currently 3rd in the NL with 77 RBI's. That's pretty good, right? Doesn't it point to his "clutch hitting"?

No it doesn't. Clutch hitting is mostly a fiction. Players get RBI's because they bat with runners on base, not because of some mystical ability to hit better with runners one base. (If they could hit better with runners on base, why don't they hit better without runners on base? Are they just lazy?)

Howard leads the NL with 373 baserunners on base during his at bats. That's about 0.74 baserunners per at bat. It's an amazing testament to the first three hitters in the Phillies lineup (generally Revere, Rollins, and Utley).

Those runners score at a 16% rate during or because of Howard's at bat. That's currently 81st in the NL. In other words, 80 players in the NL knock in runs at a greater rate than Howard.

Howard's OPS is .676, good enough for 59th best in the NL. (MLB.com only lists 71 batters, so 59th is not very good.)

But he gets RBI's, they say.

(All data is from MLB.com and baseball-reference.com before today's game.)