Confidence Intervals & Hypothesis Testing: A Quick & Dirty Approach

Confidence Intervals & Hypothesis Testing:
A Quick & Dirty Approach

Confidence intervals

Confidence intervals and hypothesis tasting are the key elements of "Inferential statistics." A confidence interval answers the question "Based on sample data, what numbers would & would not be a reasonable guess for the mean of the population from which the sample came?"

The concept of a reasonable guess requires some definition. Often in business we only need to guess an upper or lower bound on a number: "the mean operating cost is under €500 per day" or "the mean pages printed per ink cartridge is over 5000." Sometimes we need to guess both; this is what statisticians call a two sided interval: "the mean productivity is between 7 and 9 widgets per hour."

These types of guesses, called confidence intervals, quality as "reasonable" if they are calculated from sample data using a procedure which, despite the inevitable uncertainty of generalizing from a sample to its parent population, is nevertheless guaranteed to be correct in a specified proportions of applications, usually 90% or 95%.

We use sample data to determine what guessesoabout the population mean are reasonable by calculating a "confidence interval based on a "margin of error" above and/or below the population mean. A guess about the population mean is reasonable in the iight of the data if it is "close" to the sample mean; specially, if the difference between the guess and the sample mean is less than the margin of error.

The margin of error depends on two characteristics of the sample and one judgment call. The two characteristics of the sample are the number of observations and how much those observations vary from one another. Small samples and heterogeneous observations require a big margin of error, which means not much power to exclude unreasonable guesses.

The judgment call is known as the "confidence level." In practice, the confidence level chosen is almost always 95%. Choosing a confidence level of 90% means a more lenient standard of what is a "reasonable guess." The margin of error for 90% confidence is larger, so fewer guesses can be ruled out. Conversely, choosing a confidence level of 99% means a smaller margin of error and a stricter standard for a "reasonable guess."

For technical reasons statisticians refuse to call the specified proportion the "probability" that the unknown population mean is in the calculated interval, preferring the phrase "confidence level," but for most practical purposes the difference is inconsequential.

Calculating Quick & Dirty Confidence Intervels with Excel

Quick & Dirty Hypothesis Testing