# Appendix 2D Statistical inference

## A2D.1 Introduction

A population is the set of all data points in the study of interest, for instance, think of all the people in a country or all the firms operating in an economy. Often it is expensive and impractical to collect data for the entire population, so statisticians usually collect a sample of data that forms a subset of the overall population (Figure A2D.1).

Figure A2D.1 Populations and samples

Populations and samples

statistical inference
The technique of forming judgements about the underlying parameters of a population from a sample.

The idea of statistical inference is that given a sample of data, we want to use this information to infer something about the population as a whole. An example could be calculating an estimate of average incomes based on a survey collected from a sample of households.

Given that sample statistics are based on a subset of the total information, they are imprecise estimates of the true population parameters. The mean average calculated for a sample of data is unlikely to coincide exactly with the mean average for the entire population. Therefore, when using sample statistics to make inferences about the entire population, this imprecision is reflected by the attachment of probabilities.

After outlining some useful probability theory, this appendix describes two ways in which it can be used to make statistical inferences about a population average:

• A confidence interval. This provides a range of values around the sample mean, for which we can say with a certain probability indicates where the true population mean lies.
• Hypothesis testing. This uses sample averages to test specific propositions about the population average that did not occur by chance.

## A2D.2 Sample statistics and standard errors

In Appendix 2A we described how summary statistics can be calculated to estimate measures of central tendency and dispersion in a sample of data.

For a sample of data consisting of n observations (X1, X2, …, Xn), the simple arithmetic mean $$\left(\overline{X}\right)$$ can be calculated by summing all the values and dividing by n.

$\overline{X} = \frac{\sum\limits_{i = 1}^{N} \; X_{i}}{n}$

The variance of the sample data (s2) can be calculated as the sum of squared deviations from the arithmetic mean divided by (n − 1).

$s^{2} = \frac{\sum\limits_{i = 1}^{n} \; \left(X_{i} − \overline{X}\right)^{2}}{n − 1}$

The arithmetic mean and variance can also be calculated for the entire population based on N observations. The sample size n is a subset of the whole population size N.

The arithmetic mean for the entire population is defined by the parameter µ and is calculated as:

$\mu = \frac{\sum\limits_{i = 1}^{N} \; X_{i}}{N}$

The population variance, defined as σ2, is calculated as:

$\sigma^{2} = \frac{\sum\limits_{i = 1}^{N} \; \left(X_{i} − \mu\right)^{2}}{N}$

Dividing by n − 1 instead of n to calculate the sample variance s2 is known as Bessel’s correction, named after the German astronomer, mathematician, and physicist Friedrich Bessel. Observations within a sample are typically closer to the sample mean average than the population mean average, so the estimated sample variance tends to underestimate the true population variance. Bessel’s correction aims to correct for this downward bias.

sample statistic
A property of a statistical sample, such as mean and variance, that can be used to estimate the corresponding parameter of the population from which the sample was drawn.

The characteristics of the population, such as the arithmetic mean (µ) or variance (σ2), are called parameters. The purpose of statistical inference is to estimate features of the population using information contained in the sample. To estimate the value of a population parameter, we calculate the corresponding characteristic of the sample, referred to as a sample statistic. The sample mean $$\overline{X}$$ is a point estimator of the population mean μ. The sample variance s2 is a point estimator of the population variance σ2.

Suppose we take a sample of size n from a population of size N and estimate the mean, $$\overline{X}$$, and standard deviation (the square root of the variance), s. These are our sample estimates of the true but unknown population mean (μ) and standard deviation (σ). However, this is only one of many samples of size n that could have been drawn from the same population. Each different sample would have its own sample statistics $$\overline{X}$$ and s. With repeated samples we can obtain a distribution for all the calculated sample means.

For example, suppose a group of 14 researchers each take a sample from a population of earnings (in £1,000s). Figure A2D.2 shows the values of the mean earnings calculated for each sample.

Researcher name Mean earnings (£,000s)
Mary 29.1
Filipa 33.6
Augustin 37.7
Tom 28.5
Lisa 29.8
Marcel 31.5
Fiona 30.4
Lorenzo 34.7
Carol 32.8
Cecilia 27.4
Bart 25.4
Eric 28.9
John 28.9
Antonia 30.6
Mean of the means 30.7
Standard deviation of the means 3.2

Figure A2D.2 Sample means from the same population reported by different researchers, £1,000s

Sample means from the same population reported by different researchers, £1,000s

Suppose in this case that we know that the true population mean earnings equals 29.4 (£29,400) and the population standard deviation is 12.7 (£12,700). Looking at Figure A2D.2, some researchers have calculated mean earnings close to the population average, but there are a few that are much higher, such as Augustin (37.7), or lower, such as Bart (25.4).

Every time a different researcher takes a new sample, a new estimate of mean earnings is calculated. We can then estimate the mean and standard deviations of these new estimates, shown in Figure A2D.2 as the mean of the means, and the standard deviation of the means. In this example the mean of the means (30.7) is close to the population mean (29.4), as would be expected given enough samples.

But the standard deviation of the means (3.2) is much lower than the population standard deviation (12.7). If you think about this for a second, it makes sense, as each of the sample means will tend to bunch up around the true population mean. That is because each sample mean is a measure of the central tendency of the observations in the sample. So, there is likely to be less variation between these sample means than between all the individual observations making up the entire population.

standard error
The standard deviation of the sampling distribution of the sample mean.

The standard deviation of the distribution of the sample means is called the standard error of the sample mean, denoted by $$\sigma_{\overline{X}}$$. Figure A2D.3 shows how the distribution of sample means relates to the distribution of the original population data. The corresponding shape of the distribution of sample means is likely to be more tightly centered or less dispersed around the population mean µ. This is because each sample mean is a measure of the central tendency of the observations in the sample. Put simply, the standard error of the sample mean (an estimate of how far the sample mean is likely to be from the population mean) will typically be smaller than the standard deviation of the population (degree to which individuals within the population differ from the population mean).

Figure A2D.3 Distributions of the original population data and sample means

Distributions of the original population data and sample means

In practice, we would only take one sample from the population. However, a useful result from statistical theory says that we can estimate the standard error of the sample mean ($$\sigma_{\overline{X}}$$) as the sample standard deviation (s) divided by the square root of the sample size ($$\sqrt{n}$$).

$\sigma_{\overline{X}} = \frac{s}{\sqrt{n}}$

Larger sample sizes lead to lower standard errors, but at a decreasing rate. Suppose the sample size n increases from 500 to 1,000 (it doubles); in this case the square root of n increases from 22.4 to 31.6, or by 40%. If the sample increases from 500 to 64,000, n increases by a factor of 128 while the square root increases only by a factor of 11. This means that increases in the precision (accuracy) of the estimate of the standard error diminish with larger sample sizes.

This result implies that we can estimate a mean from our sample and then use its value and that of its standard error to say something about the likely values of the population mean.

## A2D.3 Probability distributions: the normal distribution and the t-distribution

The previous section introduced the idea of taking a sample from a population and using the results to infer something about the population. However, as we saw from the example in Figure A2D.2, any sample is likely to generate a mean value or standard deviation that will differ from the population values. It is only by chance that the mean and standard deviation of a sample of n observations will be the same as the parameters from the population size of N from which the sample was drawn.

Therefore, we can only say something about the likelihood of the sample means being close to the population values. We cannot make a definitive judgement about the population from the sample, but we can say how likely it is that our inferences from the sample captures the true nature of the population.

probability density function
Continuous random variables have an associated probability density function, denoted f(x). This is the probability that a randomly selected observation has the value of x. For example, f(20,000) would give the probability that a randomly selected individual had earnings of £20,000.
random variable
A numerical description of the outcome of an experiment. A discrete random variable can only take integer values, whereas a continuous random variable can potentially have any value in an interval depending only on the ability to measure accurately. For example, if we randomly choose any individual in the UK, their earnings will range between £0 to several million.

If f(x) is the probability density function associated with a sample of continuous random variables, then the total area under the curve defined by that function equals 1. This is because adding together the probabilities attached to all possible outcomes must sum up to complete certainty. Likewise, estimating the area under a curve between intervals will give a value for the probability that the continuous random variable lies within that interval. For example, we might be interested in the probability that a randomly selected person has earnings of between £10,000 and £20,000.

Two probability distributions that are very commonly used to make statistical inferences are the normal distribution and the t‑distribution.

### A2D.3.1 The normal distribution

normal distribution
A symmetric, bell-shaped distribution described entirely by its mean and standard deviation.

A frequently used probability distribution is the normal distribution. The normal distribution is symmetric, bell shaped, and described entirely by its mean and standard deviation.

The normal distribution is symmetric around the mean value of the distribution, meaning that 50% of the observations lie above the mean and 50% below. As the distribution is symmetric, it means that the three different measures of central tendency – the arithmetic mean, median and mode – are equal. The standard deviation (variance) describes how the data is dispersed around the mean. As the standard deviation increases, the distribution becomes less centred on the mean value and more spread out.

There are a whole family of normal distributions depending on the values of the means and standard deviations, as illustrated in Figure A2D.4.

Figure A2D.4 Examples of normal distributions

Examples of normal distributions

In Figure A2D.4, the mean is the same in curves A and B, but there is a greater spread of values for A than for B, suggesting that the standard deviation is greater for sample A than sample B. Curve C has the same dispersion as A, but the mean value is twice that of curve A, so the entire distribution of C lies to the right of A.

standard normal distribution
A distribution with the same characteristics as a normal distribution but with a mean of 0 and standard deviation of 1.

One specific form of normal distribution that is of special interest is the standard normal distribution, which has mean 0 and standard deviation equal to 1. Any normal distribution of random variables can be written as a standard normal distribution by the transformation:

$z_{i} = \frac{X_{i} − \overline{X}}{s}$

That is, for all the n observations in the sample, (X1, X2, …, Xn), the deviation from the arithmetic mean is divided by the standard deviation. The transformed observations (z1, z2, …, zn) will then have a mean of 0 and a standard deviation of 1.

Figure A2D.5 shows the probability density function for the standard normal distribution. It is the same form as those plotted in Figure A2D.4, except that the mean of the distribution has been shifted to 0 and the standard deviation to unity. The standard normal distribution is important because the entire family of normal distributions (based on different means and standard deviations) can easily transform into this common form, making it far easier and convenient to carry out statistical analyses on one distribution rather than contending with a large multitude of different normal distributions.

Figure A2D.5 The standard normal distribution

The standard normal distribution

z-score
The number of standard deviations an observation is from the mean.

#### How it’s done The z-score

The transformed observations (z1, z2, …, zn) making up a standard normal distribution are known as z‑scores. The z‑score can be interpreted as the number of standard deviations(s) the original observation Xi is from the mean $$\overline{X}$$. If the z‑score is negative then the observation is below the mean, if it is positive then it is above the mean. A z‑score of say 2 says that the observation is 2 standard deviations greater than the mean. A z‑score of negative 0.75 says the observation is ¾ standard deviations less than the mean.

Z‑scores are useful because they are a form of standardisation that allow us to compare observations from different normal distributions. For example, if comparing the price of two houses in two different parts of the country, the respective z‑scores would indicate where each house price lay in terms of standard deviations from its regional average price.

All normal distributions satisfy the following properties:

• 68.3% of the observations fall within plus or minus 1 standard deviation of the mean
• 95.4% of the observations fall within plus or minus 2 standard deviations of the mean
• 99.7% of the observations fall within plus or minus 3 standard deviations of the mean.

Thus, for a normal distribution, almost all values lie within 3 standard deviations of the mean. This is shown in Figure A2D.6 for the standard normal distribution.

Figure A2D.6 The normal distribution, closeness of values to the mean

The normal distribution, closeness of values to the mean

The normal distribution has three useful properties which make it applicable in a large range of statistical procedures and analyses:

• It approximates many other distributions. As the number of observations in a sample gets large, many other distributions converge in shape towards it. Therefore, the normal distribution can be used to approximate many different types of probability distributions.
• If the population has a normal distribution, the distribution of its sample means is also normal.
• Central Limit Theorem. Even if the population is not normally distributed, a useful result known as the Central Limit Theorem says that the sampling distribution of the sample mean can be approximated by a normal distribution as the sample size gets large. Therefore, we can still use the normal distribution to make inferences about sample means drawn from non-normal distributions, providing the sample size is sufficiently large.

Because the normal distribution closely approximates a wide range of continuous random variables, and the three properties described add to its general applicability, the normal distribution has a huge range of uses across economics, business, finance, and operational research.

Tables of values for the area under the standard normal distribution curve are routinely published in statistical textbooks and online. Figure A2D.7 reproduces one of these tables.

z −0.00 −0.01 −0.02 −0.03 −0.04 −0.05 −0.06 −0.07 −0.08 −0.09
−3.9 0.00005 0.00005 0.00004 0.00004 0.00004 0.00004 0.00004 0.00004 0.00003 0.00003
−3.8 0.00007 0.00007 0.00007 0.00006 0.00006 0.00006 0.00006 0.00005 0.00005 0.00005
−3.7 0.00011 0.00010 0.00010 0.00010 0.00009 0.00009 0.00008 0.00008 0.00008 0.00008
−3.6 0.00016 0.00015 0.00015 0.00014 0.00014 0.00013 0.00013 0.00012 0.00012 0.00011
−3.5 0.00023 0.00022 0.00022 0.00021 0.00020 0.00019 0.00019 0.00018 0.00017 0.00017
−3.4 0.00034 0.00032 0.00031 0.00030 0.00029 0.00028 0.00027 0.00026 0.00025 0.00024
−3.3 0.00048 0.00047 0.00045 0.00043 0.00042 0.00040 0.00039 0.00038 0.00036 0.00035
−3.2 0.00069 0.00066 0.00064 0.00062 0.00060 0.00058 0.00056 0.00054 0.00052 0.00050
−3.1 0.00097 0.00094 0.00090 0.00087 0.00084 0.00082 0.00079 0.00076 0.00074 0.00071
−3.0 0.00135 0.00131 0.00126 0.00122 0.00118 0.00114 0.00111 0.00107 0.00104 0.00100
−2.9 0.00187 0.00181 0.00175 0.00169 0.00164 0.00159 0.00154 0.00149 0.00144 0.00139
−2.8 0.00256 0.00248 0.00240 0.00233 0.00226 0.00219 0.00212 0.00205 0.00199 0.00193
−2.7 0.00347 0.00336 0.00326 0.00317 0.00307 0.00298 0.00289 0.00280 0.00272 0.00264
−2.6 0.00466 0.00453 0.00440 0.00427 0.00415 0.00402 0.00391 0.00379 0.00368 0.00357
−2.5 0.00621 0.00604 0.00587 0.00570 0.00554 0.00539 0.00523 0.00508 0.00494 0.00480
−2.4 0.00820 0.00798 0.00776 0.00755 0.00734 0.00714 0.00695 0.00676 0.00657 0.00639
−2.3 0.01072 0.01044 0.01017 0.00990 0.00964 0.00939 0.00914 0.00889 0.00866 0.00842
−2.2 0.01390 0.01355 0.01321 0.01287 0.01255 0.01222 0.01191 0.01160 0.01130 0.01101
−2.1 0.01786 0.01743 0.01700 0.01659 0.01618 0.01578 0.01539 0.01500 0.01463 0.01426
−2.0 0.02275 0.02222 0.02169 0.02118 0.02068 0.02018 0.01970 0.01923 0.01876 0.01831
−1.9 0.02872 0.02807 0.02743 0.02680 0.02619 0.02559 0.02500 0.02442 0.02385 0.02330
−1.8 0.03593 0.03515 0.03438 0.03362 0.03288 0.03216 0.03144 0.03074 0.03005 0.02938
−1.7 0.04457 0.04363 0.04272 0.04182 0.04093 0.04006 0.03920 0.03836 0.03754 0.03673
−1.6 0.05480 0.05370 0.05262 0.05155 0.05050 0.04947 0.04846 0.04746 0.04648 0.04551
−1.5 0.06681 0.06552 0.06426 0.06301 0.06178 0.06057 0.05938 0.05821 0.05705 0.05592
−1.4 0.08076 0.07927 0.07780 0.07636 0.07493 0.07353 0.07215 0.07078 0.06944 0.06811
−1.3 0.09680 0.09510 0.09342 0.09176 0.09012 0.08851 0.08692 0.08534 0.08379 0.08226
−1.2 0.11507 0.11314 0.11123 0.10935 0.10749 0.10565 0.10383 0.10204 0.10027 0.09853
−1.1 0.13567 0.13350 0.13136 0.12924 0.12714 0.12507 0.12302 0.12100 0.11900 0.11702
−1.0 0.15866 0.15625 0.15386 0.15151 0.14917 0.14686 0.14457 0.14231 0.14007 0.13786
−0.9 0.18406 0.18141 0.17879 0.17619 0.17361 0.17106 0.16853 0.16602 0.16354 0.16109
−0.8 0.21186 0.20897 0.20611 0.20327 0.20045 0.19766 0.19489 0.19215 0.18943 0.18673
−0.7 0.24196 0.23885 0.23576 0.23270 0.22965 0.22663 0.22363 0.22065 0.21770 0.21476
−0.6 0.27425 0.27093 0.26763 0.26435 0.26109 0.25785 0.25463 0.25143 0.24825 0.24510
−0.5 0.30854 0.30503 0.30153 0.29806 0.29460 0.29116 0.28774 0.28434 0.28096 0.27760
−0.4 0.34458 0.34090 0.33724 0.33360 0.32997 0.32636 0.32276 0.31918 0.31561 0.31207
−0.3 0.38209 0.37828 0.37448 0.37070 0.36693 0.36317 0.35942 0.35569 0.35197 0.34827
−0.2 0.42074 0.41683 0.41294 0.40905 0.40517 0.40129 0.39743 0.39358 0.38974 0.38591
−0.1 0.46017 0.45620 0.45224 0.44828 0.44433 0.44038 0.43644 0.43251 0.42858 0.42465
−0.0 0.50000 0.49601 0.49202 0.48803 0.48405 0.48006 0.47608 0.47210 0.46812 0.46414

z +0.00 +0.01 +0.02 +0.03 +0.04 +0.05 +0.06 +0.07 +0.08 +0.09
0.0 0.50000 0.50399 0.50798 0.51197 0.51595 0.51994 0.52392 0.52790 0.53188 0.53586
0.1 0.53983 0.54380 0.54776 0.55172 0.55567 0.55962 0.56360 0.56749 0.57142 0.57535
0.2 0.57926 0.58317 0.58706 0.59095 0.59483 0.59871 0.60257 0.60642 0.61026 0.61409
0.3 0.61791 0.62172 0.62552 0.62930 0.63307 0.63683 0.64058 0.64431 0.64803 0.65173
0.4 0.65542 0.65910 0.66276 0.66640 0.67003 0.67364 0.67724 0.68082 0.68439 0.68793
0.5 0.69146 0.69497 0.69847 0.70194 0.70540 0.70884 0.71226 0.71566 0.71904 0.72240
0.6 0.72575 0.72907 0.73237 0.73565 0.73891 0.74215 0.74537 0.74857 0.75175 0.75490
0.7 0.75804 0.76115 0.76424 0.76730 0.77035 0.77337 0.77637 0.77935 0.78230 0.78524
0.8 0.78814 0.79103 0.79389 0.79673 0.79955 0.80234 0.80511 0.80785 0.81057 0.81327
0.9 0.81594 0.81859 0.82121 0.82381 0.82639 0.82894 0.83147 0.83398 0.83646 0.83891
1.0 0.84134 0.84375 0.84614 0.84849 0.85083 0.85314 0.85543 0.85769 0.85993 0.86214
1.1 0.86433 0.86650 0.86864 0.87076 0.87286 0.87493 0.87698 0.87900 0.88100 0.88298
1.2 0.88493 0.88686 0.88877 0.89065 0.89251 0.89435 0.89617 0.89796 0.89973 0.90147
1.3 0.90320 0.90490 0.90658 0.90824 0.90988 0.91149 0.91308 0.91466 0.91621 0.91774
1.4 0.91924 0.92073 0.92220 0.92364 0.92507 0.92647 0.92785 0.92922 0.93056 0.93189
1.5 0.93319 0.93448 0.93574 0.93699 0.93822 0.93943 0.94062 0.94179 0.94295 0.94408
1.6 0.94520 0.94630 0.94738 0.94845 0.94950 0.95053 0.95154 0.95254 0.95352 0.95449
1.7 0.95543 0.95637 0.95728 0.95818 0.95907 0.95994 0.96080 0.96164 0.96246 0.96327
1.8 0.96407 0.96485 0.96562 0.96638 0.96712 0.96784 0.96856 0.96926 0.96995 0.97062
1.9 0.97128 0.97193 0.97257 0.97320 0.97381 0.97441 0.97500 0.97558 0.97615 0.97670
2.0 0.97725 0.97778 0.97831 0.97882 0.97932 0.97982 0.98030 0.98077 0.98124 0.98169
2.1 0.98214 0.98257 0.98300 0.98341 0.98382 0.98422 0.98461 0.98500 0.98537 0.98574
2.2 0.98610 0.98645 0.98679 0.98713 0.98745 0.98778 0.98809 0.98840 0.98870 0.98899
2.3 0.98928 0.98956 0.98983 0.99010 0.99036 0.99061 0.99086 0.99111 0.99134 0.99158
2.4 0.99180 0.99202 0.99224 0.99245 0.99266 0.99286 0.99305 0.99324 0.99343 0.99361
2.5 0.99379 0.99396 0.99413 0.99430 0.99446 0.99461 0.99477 0.99492 0.99506 0.99520
2.6 0.99534 0.99547 0.99560 0.99573 0.99585 0.99598 0.99609 0.99621 0.99632 0.99643
2.7 0.99653 0.99664 0.99674 0.99683 0.99693 0.99702 0.99711 0.99720 0.99728 0.99736
2.8 0.99744 0.99752 0.99760 0.99767 0.99774 0.99781 0.99788 0.99795 0.99801 0.99807
2.9 0.99813 0.99819 0.99825 0.99831 0.99836 0.99841 0.99846 0.99851 0.99856 0.99861
3.0 0.99865 0.99869 0.99874 0.99878 0.99882 0.99886 0.99889 0.99893 0.99896 0.99900
3.1 0.99903 0.99906 0.99910 0.99913 0.99916 0.99918 0.99921 0.99924 0.99926 0.99929
3.2 0.99931 0.99934 0.99936 0.99938 0.99940 0.99942 0.99944 0.99946 0.99948 0.99950
3.3 0.99952 0.99953 0.99955 0.99957 0.99958 0.99960 0.99961 0.99962 0.99964 0.99965
3.4 0.99966 0.99968 0.99969 0.99970 0.99971 0.99972 0.99973 0.99974 0.99975 0.99976
3.5 0.99977 0.99978 0.99978 0.99979 0.99980 0.99981 0.99981 0.99982 0.99983 0.99983
3.6 0.99984 0.99985 0.99985 0.99986 0.99986 0.99987 0.99987 0.99988 0.99988 0.99989
3.7 0.99989 0.99990 0.99990 0.99990 0.99991 0.99991 0.99992 0.99992 0.99992 0.99992
3.8 0.99993 0.99993 0.99993 0.99994 0.99994 0.99994 0.99994 0.99995 0.99995 0.99995
3.9 0.99995 0.99995 0.99996 0.99996 0.99996 0.99996 0.99996 0.99996 0.99997 0.99997

Figure A2D.7 The standard normal distribution table

The standard normal distribution table

cumulative probability density function
This denotes the probability that a random variable takes on a value less than or equal to x, denoted by F(x). This relates to the area under the probability density function f(x) to the left of x. For example, F(20,000) would be the probability that a randomly selected individual had earnings of £20,000 or less.

How do we read this table? The table shows the cumulative probability density function for the normal distribution, that is, the probability that a standard normal variable lies below some value.

Consider Figure A2D.8. The shaded area shows the probability that a continuous random variable is to the left of the value of a. Suppose a = −1.32, then to find the probability of the shaded area we turn to the first part of Figure A2D.7 which corresponds to negative z‑scores. We then go down the first column to −1.3 and then across to 0.02. This reports the value of 0.09342 as the area to the left of z = −1.32. We can interpret this as the probability that (z < 1.32) = 0.09342. As the total area under the standard normal distribution curve equals 1, we can easily find the area that is equal and to the right of a as Pr(za) = 1 − 0.09342 = 0.90658.

Figure A2D.8 The standard normal distribution, probability that z < a

The standard normal distribution, probability that z < a

Now consider Figure A2D.9 showing the shaded area to the left of b. Suppose we are interested in the area to the left of b = 1.45, which is the probability from a standard normal distribution that a continuous random variable is less than 1.45 (Pr(z < 1.45)). To find this we turn to the second part of Figure A2D.7 which shows the positive z‑scores from the standard normal distribution. We go down the first column to 1.4 and then across to 0.05 which gives the value of 0.92647. Therefore, the probability that z is less than 1.45 is 0.92647. Again, the area equal to and to the right of 1.45 corresponds to the probability that z ≥ 1.45, which equals 1 – 0.92647 = 0.07353.

Figure A2D.9 The standard normal distribution, probability that z < b

The standard normal distribution, probability that z < b

Finally, the area between a and b in the standard normal distribution corresponds to the probability that Pr(a < z < b). In the case of a = −1.32 and b = 1.45, then we can find Pr(−1.32 < z < 1.45) = 0.92647 – 0.09342 = 0.83305.

There are some frequently used and important numbers taken from the standard normal distribution tables. The values of z for which 95% of the distribution lies either side of the mean are where there is 2.5% of the distribution in the top and bottom tails. This corresponds to the z-values where the area to the left is 97.5% and 2.5% respectively. As the standard normal distribution is symmetric, we find these z-values to be positive 1.96 and negative 1.96.

In the case where 99% of the distribution is either side of the mean, the respective z-values correspond to where 99.5% of the distribution lies to the left of z, and where 0.5% of the distribution lies to the left of z. Again, looking at Figure A2D.7, the respective z-values are positive 2.58 and negative 2.58.

### A2D.3.2 The t-distribution

t-distribution
The t-distribution is symmetric and bell-shaped, like the normal distribution. However, the t-distribution has heavier tails, meaning that it is more prone to producing values that fall further from its mean.

Another distribution that is frequently used in economic and statistical analysis is the t‑distribution. This is like the normal distribution in that it is symmetric and bell-shaped, but it is flatter and more dispersed (see Figure A2D.8). Consequently, the t‑distribution may be preferred to the normal distribution in situations when we face greater uncertainty in making statistical inferences.

Figure A2D.10 The probability density functions of the t-distribution (blue) compared to normal distribution (green)

The probability density functions of the t-distribution (blue) compared to normal distribution (green)

The t-distribution is often known as the Student’s t-distribution. This is because it was developed by the statistician William Sealy Gosset under the pseudonym “Student”.

The degree of dispersion inherent in the t‑distribution depends on the sample size. As the sample size becomes large, the t‑distribution converges on the normal distribution. However, as the sample size becomes small, the tails of the t‑distribution become increasingly fatter. The use of the t‑distribution is most common when sample sizes are small, and statisticians face more uncertainty in making inferences. As a general rule of thumb, the t‑distribution is used when sample sizes are smaller than 25 observations (n < 25).

The t‑distribution tables commonly reported in statistics textbooks and online, as illustrated by Figure A2D.11, are a different format from the standard normal distribution table. Firstly, the t‑distribution tables show the area or probability to the right of the t‑value. This is defined by the value α. For example, if α = 0.05 then these are the t‑values for which 5% of the distribution lies to the right, and correspondingly 95% to the left.

Secondly, while the standard normal distribution reports one number regardless of the sample size, the t‑distribution calculates separate values by numbers of observations. The parameter v is the number of degrees of freedom, which is the sample size minus one (v = n − 1).

α 0.4 0.25 0.15 0.1 0.05 0.025 0.01 0.005 0.001 0.0005
1 0.3249 1.0000 1.9626 3.0777 6.3133 12.7062 31.8205 63.6567 318.3087 636.6189
2 0.2887 0.8165 1.3862 1.8856 2.9200 4.3027 6.9646 9.9248 22.3271 31.5991
3 0.2767 0.7649 1.2498 1.6377 2.3534 3.1824 4.5407 5.8409 10.2145 12.9240
4 0.2707 0.7407 1.1896 1.5332 2.1318 2.7764 3.7469 4.6041 7.1732 8.6103
5 0.2672 0.7267 1.1558 1.4759 2.0150 2.5706 3.3649 4.0321 5.8934 6.8688
6 0.2648 0.7176 1.1342 1.4398 1.9432 2.4469 3.1427 3.7074 5.2076 5.9588
7 0.2632 0.7111 1.1192 1.4149 1.8946 2.3646 2.9980 3.4995 4.7853 5.4079
8 0.2619 0.7064 1.1081 1.3968 1.8595 2.3060 2.8965 3.3554 4.5008 5.0413
9 0.2610 0.7027 1.0997 1.3830 1.8331 2.2622 2.8214 3.2498 4.2968 4.7809
10 0.2602 0.6998 1.0931 1.3722 1.8125 2.2281 2.7638 3.1693 4.1437 4.5869
11 0.2596 0.6974 1.0877 1.3634 1.7959 2.2010 2.7181 3.1058 4.0247 4.4370
12 0.2590 0.6955 1.0832 1.3562 1.7823 2.1788 2.6310 3.0545 3.9296 4.3178
13 0.2586 0.6938 1.0795 1.3502 1.7709 2.1604 2.6503 3.0123 3.8520 4.2208
14 0.2582 0.6924 1.0763 1.3450 1.7613 2.1448 2.6245 2.9768 3.7874 4.1405
15 0.2579 0.6912 1.0735 1.3406 1.7531 2.1314 2.6025 2.9467 3.7328 4.0728
16 0.2576 0.6901 1.0711 1.3368 1.7459 2.1199 2.5835 2.9208 3.6862 4.0150
17 0.2573 0.6892 1.0690 1.3334 1.7396 2.1098 2.5669 2.8982 3.6458 3.9651
18 0.2571 0.6884 1.0672 1.3304 1.7341 2.1009 2.5524 2.8784 3.6105 3.9216
19 0.2569 0.6876 1.0655 1.3277 1.7291 2.0930 2.5395 2.8609 3.5794 3.8834
20 0.2567 0.6870 1.0640 1.3253 1.7247 2.0860 2.5280 2.8453 3.5518 3.8495
21 0.2566 0.6864 1.0627 1.3232 1.7207 2.0796 2.5176 2.8314 3.5272 3.8193
22 0.2564 0.6858 1.0614 1.3212 1.7171 2.0739 2.5083 2.8188 3.5050 3.7921
23 0.2563 0.6853 1.0603 1.3195 1.7139 2.0687 2.4999 2.8073 3.4850 3.7676
24 0.2562 0.6848 1.0593 1.3178 1.7109 2.0639 2.4922 2.7969 3.4668 3.7454
25 0.2561 0.6844 1.0584 1.3163 1.7081 2.0595 2.4851 2.7874 3.4502 3.7251
26 0.2560 0.6840 1.0575 1.3150 1.7056 2.0555 2.4786 2.7787 3.4350 3.7066
27 0.2559 0.6837 1.0567 1.3137 1.7033 2.0518 2.4727 2.7707 3.4210 3.6896
28 0.2558 0.6834 1.0560 1.3125 1.7011 2.0484 2.4671 2.7533 3.4082 3.6739
29 0.2557 0.6830 1.0553 1.3114 1.6991 2.0452 2.4620 2.7564 3.3962 3.6594
30 0.2555 0.6823 1.0547 1.3104 1.6973 2.0423 2.4573 2.7500 3.3852 3.6460
35 0.2553 0.6816 1.0520 1.3062 1.6896 2.0301 2.4377 2.7238 3.3400 3.5911
40 0.2530 0.6807 1.0500 1.3031 1.6839 2.0211 2.4233 2.7045 3.3069 3.5510
45 0.2549 0.6800 1.0485 1.3006 1.6794 2.0141 2.4121 2.6896 3.2815 3.5203
50 0.2547 0.6794 1.0473 1.2987 1.6759 2.0086 2.4033 2.6778 3.2614 3.4960
60 0.2545 0.6786 1.0455 1.2958 1.6706 2.0003 2.3901 2.6603 3.2317 3.4602
70 0.2543 0.6780 1.0442 1.2938 1.6669 1.9944 2.3808 2.6479 3.2108 3.4350
80 0.2542 0.6776 1.0432 1.2922 1.6641 1.9901 2.3739 2.5387 3.1953 3.4163
90 0.2541 0.6772 1.0424 1.2910 1.6620 1.9867 2.3685 2.6316 3.1833 3.4019
100 0.2540 0.6770 1.0418 1.2901 1.6602 1.9840 2.3642 2.5259 3.1737 3.3905
120 0.2539 0.6765 1.0409 1.2886 1.6577 1.9799 2.3578 2.6174 3.1595 3.3735
150 0.2538 0.6761 1.0400 1.2872 1.5551 1.9759 2.3515 2.6090 3.1455 3.3566
200 0.2537 0.6757 1.0391 1.2838 1.6525 1.9719 2.3451 2.6006 3.1315 3.3398
300 0.2536 0.6753 1.0382 1.2844 1.6499 1.9679 2.3388 2.5923 3.1176 3.3233
0.2533 0.6745 1.0364 1.2816 1.6949 1.9600 2.3263 2.5758 3.0902 3.2905

Figure A2D.11 Values for the t-distribution probability density function

Values for the t-distribution probability density function

To understand how we can use the t‑distribution tables consider, the following examples:

• With 30 degrees of freedom (a sample size = 31), the probability that the t‑value is above 2.0423 is 0.025 or 2.5%.
• As the t‑distribution is symmetric, the probability that the t‑value is lower than negative 2.0423 for 30 degrees of freedom is also 0.025 (2.5%).
• Therefore, the probability that a t‑value with 30 degrees of freedom lies between ±2.0423 is 0.95 or 95%: Pr(−2.0423 ≤ t ≤ 2.0423) = 0.95.

This range for a 95% interval is larger in absolute value for the t‑distribution than for the standard normal distribution, where it was ±1.96. The t‑distribution adjusts for the fact that with small sample sizes we can be less confident about our inferences on population values:

• With very small sample sizes, such as v = 15 (n = 16), the value that corresponds to 95% probability interval is larger (±2.1314) than for v = 30 (n = 31).
• The interval values for a 99% range (the 0.005 entry in the table) in the t‑distribution for 30 degrees of freedom is ±2.75, wider than ±2.58 for the normal distribution.
• But with 300 degrees of freedom, the ranges for 95% and 99% probabilities in the t‑distribution are ±1.9679 and ±2.5923, respectively, much closer to the values for the normal distribution.

## A2D.4 Confidence intervals

confidence interval
Probability bands around values of the sample mean, which give an estimated range for the value of the underlying parameter with a stated probability (usually 95% or 99%).

The normal and t-distributions in the previous section are commonly used in statistical inference. The simplest application is to use the distributions to put probability bands around values of the sample mean, known as confidence intervals. Frequently we use 95% or 99% probabilities that the sample mean is a good estimate of the population mean.

Which distribution we use to calculate a confidence interval depends on whether we know the population standard deviation (σ) and whether the sample size (n) is small. In practice, statistical and econometric software, such as SPSS or Stata, report the values from the t-distribution, given that it approaches the normal for large samples.

Suppose we take a large sample of test scores for 400 pupils and find a sample mean of 56 and a standard deviation of 15 on a test score. What range can we suggest for the population mean for a given probability level?

If we are using the normal distribution, the general formula for a confidence interval is: $$\overline{X} ± z_{\alpha/2} × \sigma/\sqrt{n}$$

• $$\overline{X}$$ is the estimated sample mean.
• $$z_{\alpha/2}$$ is the respective z-value from the standard normal distribution tables. If we are looking for a 95% distribution, then α = 0.05. In which case the respective z-values are defined by where z = α/2 = 0.05/2 = 0.025, or 2.5% of the distribution is to the left in the case of the lower bound and to the right for the upper bound. These can be found from Figure A2D.7 as $$z_{0.05/2} = \pm1.96$$.
• $$\sigma\sqrt{n}$$ is the standard error of the sample mean. If the population standard deviation is unknown, then this can be approximated by using the sample standard deviation s.

Using this formula, a 95% confidence interval for the population average of test results is:

$56 \pm 1.96 × 15/\sqrt{400} = 56 \pm 1.47$

Therefore, the 95% confidence interval for the population mean is between 54.53 and 57.47.

Suppose instead we want to be 99% confident that the population mean is within a range of values. In this case, the appropriate z-values are given by α = 0.01 so α/2 = 0.005, that is, where 0.5% of the distribution lies to the left of the z-value and 0.5% to the right. From Figure A2D.7, we see that $$z_{0.01/2} = \pm 2.58$$.

Therefore, the 99% confidence interval is $$56 \pm 2.58 × 15/\sqrt{400} = 56 \pm 1.935$$, or between 54.065 and 57.935.

The interval is larger than for the 95% interval: to be more confident in our estimation of the sample mean, we need to increase the range of probable values.

Suppose now we still wish to estimate a 95% confidence interval for the population mean but have a smaller sample of 20 pupils, from which we estimate the sample mean as 56 and the sample standard deviation as 15. In this case we decide to use the t‑distribution, as the smaller sample size leads to greater uncertainty about the location of the population mean.

The general formula for calculating a confidence interval with the t‑distribution is:

$\overline{X} \pm t_{v,\alpha} × s/\sqrt{n}$

The main difference is that the confidence interval is defined using values from the t‑distribution with v degrees of freedom, rather than the standard normal distribution. From Figure A2D.11, with 20 observations there are 19 degrees of freedom, and the respective α = 0.025, meaning there is 2.5% of the distribution each in the top and bottom tables, and the respective t‑value is 2.093.

The 95% confidence interval for the population mean is:

$56 \pm 2.093 × 15/\sqrt{20} = 56 \pm 7.02$

Therefore the 95% confidence interval for the population mean is between 48.98 and 63.02. This is a much wider range than for the previous case, reflecting both the higher t‑value and the larger standard error. In general, with larger samples we can be more confident about the location of the population mean than when using smaller samples.

Strictly speaking, the t‑distribution should be used over the standard normal distribution when either the sample size is relatively small (<25 observations), or when the actual population standard deviation is unknown and approximated by the sample standard deviation to calculate the standard error of the sample mean.

As this second condition is almost always the case, the general convention is to use the t‑distribution to calculate confidence intervals. This is the approach embedded in most textbooks and statistical software. It is further justified by the fact that in large samples, the t‑distribution converges on the properties of the normal distribution.

## A2D.5 Hypothesis testing

hypothesis testing
A method of statistical inference for testing the acceptance or rejection of a stated hypothesis at a given level of statistical significance.

Hypothesis testing is essentially about making decisions regarding whether the hypothesis is true or false.

• Are women paid less, on average, than men?
• Is a new drug better than the existing treatment?
• Do countries with an independent central bank have lower inflation?
null hypothesis
A conjecture about the characteristics of a population that is testable using statistical inference on a sample of observations.

The approach to hypothesis testing involves a binary decision, that is, to accept or reject the hypothesis. A null hypothesis is a statement or proposition initially presumed to be true, for example men and women earn the same. Evidence is then gathered, usually by collecting a sample of data, to see if it is consistent with the hypothesis. If it is, the null hypothesis continues to be considered ‘true’, but if not, the null is rejected in favour of the alternative hypothesis (for example that men earn more than women, women earn more than men, or men and women earn different amounts). The decision is based on the sample data and generalised to the population (for example in the sample, men are found to earn more than women, so we conclude it is true for the population as a whole).

Hypothesis testing has a consistent structure. Suppose we collect a sample of incomes from 31,036 people and find that the average income is £33,368 with a standard deviation equal to £33,348, and we want to test the hypothesis that the average income for the entire population is equal to £34,000. A test of this hypothesis can be carried out in five steps.

### Step 1: Form the hypothesis

The null hypothesis, which we denote as H0, is the proposition we assume to be true and wish to test, in this case, that the average income for the population is µ = £34,000.

The alternative hypothesis in this example is set up as a two-tail test such that H1: μ ≠ £34,000. If, however, we want to test whether to reject the null hypothesis in a given direction, such as the average income is not below £34,000, we would not want to reject the null if the average income was above £34,000. In this case, we can set the alternative test as a one-tail test, H1: μ < £34,000, which tests the rejection of the null hypothesis in one direction only.

The alternative hypothesis, denoted as H1, is simply that the average income is not equal to £34,000. So, the statement of the null and alternative hypotheses is:

H0: µ = £34,000
H1: µ ≠ £34,000

This is a two-tail test, since the rejection region for the null hypothesis occupies both sides of the distribution, that is we can reject because µ < £34,000 or because µ > £34,000.

### Step 2: Choose a significance level for the test

significance level
The significance level is the probability that the results observed in a study could have occurred by chance alone. For example, a significance level of 0.05 indicates there is a one-in-twenty chance of making a false-positive by rejecting the null hypothesis when it is true.

Hypothesis tests set cut-off points, known as significance levels, to set the probabilities of making Type I errors. A Type I error is the probability of rejecting the null hypothesis when it is in fact true. A common choice of significance level for a hypothesis test is 5%, which implies that there is a one-in-twenty chance of a false positive.

You might ask, why not set a very low significance level to minimise the probability of a Type I error? The conundrum is that a lower significance test raises the chance of a Type II error, that is, incorrectly accepting the null hypothesis when it is false. As we lower the significance level, we make it harder to reject the null hypothesis, which reduces the chance of rejecting it when it is true, but with the trade-off of making it harder to reject if it is false. This trade-off, inherent in the significance level of the test, is shown in Figure A2D.12.

Figure A2D.12 Hypothesis test significance level decision process

Hypothesis test significance level decision process

### Step 3: Look up the critical value for the test

Having chosen the significance level of the test, the critical value that determines whether the null hypothesis is accepted or rejected is taken from the standard normal or t‑distribution tables. In this example, there is a relatively large sample size, so we choose to use the normal distribution. A 5% significance level, and a two-sided test, means we need to acquire the z-value that cuts off 2.5% from each tail of the distribution.

Using Figure A2D.7, the critical value for the test-statistic is z* = ±1.96.

### Step 4: Calculate the test statistic

The test statistic for the hypothesis test is given by:

$z = \frac{\overline{X} − \mu_{H0}}{\frac{s}{\sqrt{n}}}$

where the sample mean $$\overline{X} = 33,368$$; the sample standard deviation s = 33,348; the sample size n = 31,036; and the value of the population mean under the null hypothesis is $$\mu_{H0} = 34,000$$.

Using these values, we can calculate the test statistic as:

$z = \frac{33,368 − 34,000}{\frac{33,348}{\sqrt{31,036}}} = \frac{−632}{189.29} = −3.34$

### Step 5: The decision

Finally, we compare the test statistic z = −3.34 with the critical value z* = ±1.96.

Because the test statistic lies outside the acceptance region (±1.96), we reject the null hypothesis.

We reject H0 at the 5% significance level, because the sample average of £33,368 is more than 1.96 standard deviations below £34,000.

We can use the same five-step process to test a null hypothesis in a small sample using the t‑distribution.

Suppose a car manufacturer makes the claim that its cars have fuel efficiency of at least 40 miles per gallon (mpg). However, from a sample of 12 cars, the average fuel efficiency is found to be 35mpg with a standard deviation of 15. How can we test the manufacturer’s claim of 40mpg as the true average with a 5% significance level? Again, we follow the same five step approach.

### Step 1: Form the hypothesis

The null hypothesis is that the average mpg for all of the manufacturer’s cars is at least 40. Therefore, the alternative hypothesis is that fuel efficiency is lower than 40mpg. This is a one-tail test as we are only interested in the acceptance or rejection of the null hypothesis in one direction.

H0: µ ≥ 40
H1: µ < 40

### Step 2: Choose a significance level for the test

A 5% significance level is chosen, α = 0.05.

### Step 3: Look up the critical value for the test

From Figure A2D.11, we need to find the t‑value that cuts off the bottom 5% of the distribution. Given a sample size of 12, we have v = 12 − 1 = 11 degrees of freedom. So, the critical value is t* = −1.7959.

### Step 4: Calculate the test statistic

The test statistic is given by:

$t = \frac{\overline{X} − \mu_{H0}}{\frac{s}{\sqrt{n}}}$ $= \frac{35 − 40}{\frac{15}{\sqrt{12}}} = −1.15$

### Step 5: The decision

The test statistic t = −1.15 does not lie below the critical value of t* = −1.7959, so based on the sample evidence, we cannot reject the null hypothesis that the manufacturer’s cars have fuel efficiency of at least 40mpg.

## A2D.6 Summary

Probability theory can be used to make statistical inferences, that is, to use sample statistics to infer something about the population as a whole. Central to this is the application of probability distributions, and we introduced two of the most used, the normal distribution and the t‑distribution.

A confidence interval describes a range in which the true population average is estimated to be located with a given probability. This range will be smaller if the sample size is larger, the true population variance is known, and where the probability of the population mean being in the interval is lower. The conventional approach is to use the t‑distribution to form confidence intervals, as it is rare that the population variance is known, so using the sample variance in its place adds to the uncertainty of the estimate. However, when the sample size gets large, the t‑distribution converges on the normal distribution.

Hypothesis testing is the process of using probability theory to test whether a proposition can be accepted or rejected at a given level of statistical significance. These are binary choices, focusing on the acceptance or rejection of a null hypothesis about the population mean, which is a statement presumed to be true but which can be tested using sample information.

The key decisions in setting up a hypothesis test is whether the null hypothesis is tested in a one-tail or two-tail test, and the level of significance. A one-tail test is used when the rejection of the null hypothesis is tested in one direction only, whereas a two-tail test is where the null can be rejected in either direction. If in doubt, it is recommended that you use a two-tail test.

The significance level of the hypothesis test gives the probability of a false positive, known as a Type I error, that the null hypothesis is true but rejected by chance. A lower significance level reduces this chance, but correspondingly raises the chance of a Type II error where the null is accepted when false. The two most common significance levels used in hypothesis tests are the 5% and 1% significance levels.