Appendix 1 Sources for economic statistics

Gareth James and Craig McLaren

A1.1 Introduction

Throughout this book we describe and discuss statistics about the economy. Where does the information come from to compile these statistics? Collecting this data is one of the major activities of the Office for National Statistics. This appendix outlines how the ONS collects this information, and discusses some of the issues that arise.

ONS Resource

A comprehensive list of surveys produced by the ONS is available.

As an example, the Annual Business Survey (ABS) is a flagship comprehensive dataset which is compiled on an annual basis.

The sample uses approximately 62,000 businesses and covers most business sectors, collecting financial data from businesses’ end-year accounts, including turnover, wages and salaries, purchases of goods and services, stocks and capital expenditure. Greater detail on the methods underpinning the ABS) is also available.

A1.2 Data collection

A1.2.1 Sources

Official statisticians use two means to collect the information that is needed to compile economic statistics:

A1.2.2 Evolving user needs

Statistics, however carefully compiled and however high the quality, are of no use if they do not fulfil some need on the part of users. One way to think about this is whether statistics improve decision-making, in the sense that the outcomes of decisions for those affected would be better than if the information was not available.

In economic statistics, many decision-makers are within the public sector: the Bank of England determines the interest rate, the Treasury sets taxes and public spending. But there will be others, too. Companies make decisions about their operations and investment plans. Households make decisions on how much to spend and save. In deciding what economic statistics should be provided and how frequently, we need to keep in mind the needs of all of these agents.

Not every need for information can be met within available resources. Statistics agencies such as the ONS therefore assess and reassess these needs. So the ONS talks to the Treasury and the Bank of England all the time, and find that their most pressing needs often change over time. It consults the wider community too. The amount of active engagement with users has grown in recent years.

ONS Resource

The ONS runs a series of events, including its ONS Forum, for analysts, business economists, media, and academic users, to discuss its work.

User needs evolve due to policy changes or external events, so the ONS must measure new or updated concepts. One example is the measurement of trade statistics at a regional level. There has been a growing realisation that trade between different parts of the UK, as well as with overseas partners, is an important factor in understanding and determining regional growth, especially after Brexit.

Having assessed – or reassessed – user needs, the next step is to consider what sources of information might be available to compile the statistics to fulfil those needs. We need to consider:

Traditionally, monthly and quarterly surveys have been used for recent and short-term movements in the economy; more detailed and complex annual surveys provide greater detail, but are available only later.

Non-survey sources are increasingly being evaluated and used where they meet the data requirements. In the UK, the availability of non-survey sources has increased in recent years due to the opportunities provided by data legislation (for example, Chapter 7 of the Digital Economy Act, 2017). The Independent Review of UK Economic Statistics (Bean 2016: 11) noted, “[ONS should] make the most of existing and new data sources and the technologies for dealing with them”.

A1.2.3 Trading off costs and benefits of collecting information

Just because an information source is available does not mean that it should be used. Surveys are often expensive to maintain. There are also the “compliance costs” in time and effort for the people who must respond to the surveys.

Non-survey sources are often cheaper than traditional surveys, since the information may already exist. But the costs may not be negligible either. Administrative data, for example, may need processing to yield the information that is wanted, or may not relate exactly to the statistics, so adjustments may need to be made.

Citizens often have legitimate concerns about the privacy of their personal or commercial information. So administrative or big data can be used only if there are adequate means to safeguard individual and commercial confidentiality.

We must make a judgement balancing the interests of users, available resources, keeping down respondent burden and other costs of collection. In economic terms, the important question is:

Does the marginal benefit of providing additional statistical information to users exceed the marginal cost of producing that information?

This underlines the importance of prioritising user needs. Providing statistics with low benefit that crowd out those with high benefit would obviously not be a good outcome. For data sources, we must find the least costly ways to collect information. This is enshrined in the Government Statistical Services (GSS) Guidance on Monitoring and Reducing Burden and in the Code of Practice for Statistics:

ONS Resource

The ONS provides Guidance on Monitoring and Reducing Respondent Burden as well as a general Code of Practice for Statistics.

“The suitability of existing data, including administrative, open and privately-held data, should be assessed before undertaking a new data collection.” (Practice v5.3)

A1.3 Surveys

Carrying out a survey, performed with random sampling (so sometimes called a sample survey), is a well-established and widely used approach. It has its own theoretical framework, developed over many years, allowing the production of unbiased estimates and indicators of uncertainty. Surveys allow specific data requirements to be met and controlled. But they come at a cost to both the body carrying out the survey and to the respondents who are surveyed.

Surveys come in different forms:

Some are surveys of businesses in the UK, which ask about various dimensions of economic-related activity, for example, sales, purchases, investments, capital expenditure, number of employees, number of vacancies, wages and salaries, pensions, hour-worked, and number and location of sites (factories, offices, warehouses).

Other surveys collect information from individuals about themselves and their households. These include items such as information about labour-market activity (employment and self-employment), jobs, earnings, income, spending, wealth and assets, and travel and tourism.

The population census (every 10 years in the UK) provides detailed estimates of the population by demographic characteristics and location. Census outputs are widely used and can be used as benchmark indicators within economic statistics.

If we want to generate information of value to users from sample surveys, there are three main issues:

A1.3.1 Survey design

The starting point for the design of most surveys is likely to be reference to a register. A register provides as complete a record as possible of information relating to the area of interest. It is kept up to date on a regular basis to ensure it continues to be representative of the population. The register is often used as the basis for the sampling frame.

For many surveys intended to produce economic statistics, the relevant register will be one cataloguing information about businesses: who they are, what activities and products they are concerned with, size, location, and so on. In the UK, the ONS compiles and maintains the Interdepartmental Business Register for this purpose.

The Inter-Departmental Business Register is comprehensive and is used to help the sample design of most business surveys. The primary sources of information to compile it are Value Added Tax (VAT) registrations and Pay As You Earn (PAYE) registrations notified to Her Majesty’s Revenue and Customs (HMRC). These feed through to ONS, and after matching become Enterprises on the IDBR. A list of local sites for each Enterprise are also compiled after contact with the business identified as the Enterprise. Enterprises usually respond for themselves in entirety as a single Reporting Unit, but some are split into more than one Reporting Units, and it is Reporting Units that are sampled.

The IDBR holds additional information about each of the businesses, including structures, economic activity, employment and turnover, which is used to assist sample design.

To keep the IDBR up to date, an updating survey is sent to all large businesses annually, to medium-sized businesses every third or fourth year, and to small businesses from time to time, but less frequently. Coverage of the IDBR, in terms of all economic activity in the UK, is good, with about 99% of activity being captured. However, many of the very smallest businesses do not appear on the IDBR, as they are too small to register to pay VAT, and/or have no employees.

A1.3.2 Sampling and sample design

A sample is a subset of the population of interest, chosen or selected to be surveyed to collect information. For probability sampling to be possible, a sampling frame is required. The sample design needs to be specified, and a sample-selection mechanism used to choose the sample at random. The design chosen will need to reflect the purpose for which the information is being sought, for example, how frequently it is needed, and how precise the information collected needs to be.

A1.3.3 Data collection methods

The next question is how the data is actually going to be collected from sample respondents. There are two categories:

Method Positive attributes Challenges
Face-to-face interviews Best response rates
Greatest accuracy
Can run long interviews
Interviewer available for support and advice
Capture data digitally
Can lead to other activities (diaries, meet other respondents)
Most expensive
Less suitable for sensitive topics
Needs interviewer during work hours
Needs respondent to be available while interviewer working
Telephone interviews Less expensive than face-to-face
Response rates still good
Capture data digitally
Interviews shorter than face-to-face
Self-completion, paper questionnaires Least expensive to set up Lowest response rates
Need simple and short questionnaire (less so for business)
Need to capture data in second step
Depends on ability of respondent to complete correctly
Question ordering is fixed and may bias responses
Self-completion, internet Least expensive per sample
Options for help
Can randomise order of questions or route respondents automatically
Captures data digitally
Checks data when entered
High set-up cost
Biased sample (internet users)
Low response rates (if not mandatory)
Need simple and short questionnaire (less so for business)
Computer-assisted in field Captures data digitally
Checks data when entered
Can randomise order of questions or route respondents automatically
High set-up cost
Needs careful design

Figure A1.1 Data collection methods

Data collection methods

A1.3.4 Data checking

We now have a well-designed sample and we have collected data from those in the sample. But, in most cases, the data derived from surveys can represent a large number of data points. It would be dangerous to accept all of this information as valid without further scrutiny, so the first part of processing is checking the responses. Problems include:

An example of a check might be comparing a response this month with the one from last month or last year to see whether the change looks plausible. Checks could also include looking at internal consistency, for example, does the total number of employees quoted in a return equal the sum of the returns of part-time and full-time employees? We need to find ways to deal with the problems:

A1.3.5 Classification

Often, users will have an interest not just in aggregate figures such as GDP or the overall level of employment, but in the breakdown of those statistics across one or more dimensions. They may want to know, for example, what is happening to the growth of output in particular sectors such as manufacturing or construction. Or they may want to know how employment is changing in particular sectors, or how it is changing broken down between males and females, or by age group. Particular returns from a survey can be given a code to indicate to which sector, age group or gender it is relevant. By selecting only those survey returns with a particular code, it is thus possible to construct final statistics relating to the various subaggregates as well as the statistics for the aggregate.

For this coding process to be possible, there needs to be a list of categories that divide the aggregate exhaustively into subaggregates. Codes can then be assigned, corresponding to where the return in question falls within this list. These lists are usually called classifications. To help international comparability, ONS generally uses internationally agreed classification systems. For classifying economic activity to particular business areas, ONS uses the Standard Industrial Classification (SIC). In classifying employment to particular jobs and skills, it uses the Standard Occupational Classification (SOC).

Standard Industrial Classification

The UK uses the Standard Industrial Classification (SIC) to categorise economic activity of businesses. The current version is called SIC 2007 (denoting its year of publication), and is a five-digit, hierarchical classification. The highest level of aggregation is the Section, which indicates broad sectors of economic activity that are denoted with letters, and each comprises one or more two-digit Divisions. The three-, four- and five-digit codes then denote Groups, Classes and Sub-classes respectively, which are progressively finer divisions of the type of economic activity.

The UK SIC is comparable internationally. The United Nations owns the worldwide UN SIC, and co-ordinates updates to that classification, and the UK SIC 2007 and UN SIC are comparable at the two-digit level. Classifications are updated regularly to reflect the changing composition of the economy.

The classification of each business on the Business Register is usually derived from information it supplies about the principal economic activity that occurs on each of its sites. A business may carry out activities that fall into a number of different SIC codes, for example if it has a factory, a warehouse, supplies some goods wholesale and has its own retail outlets. Rules are used to determine its dominant economic activity, to govern its classification.

Standard Occupational Classification

The UK develops and uses the Standard Occupational Classification (SOC) to classify jobs and skills into occupational groups, with expert input from the research community. It is based closely on a schema drawn up and owned by the International Labour Office. The classification is updated every ten years, in line with the UK population census. SOC 2010 is a four-digit hierarchical code, with each digit in the hierarchy representing a more finely defined occupational area.

A1.3.6 Estimation

Having worked through the steps set out above, at this stage we should have a reliable and unbiased sample and should have collected information from respondents in that sample of a quality with which we can be satisfied. But it still remains necessary to use the information provided by the sample to compile those final statistics that users are looking for and will find of value. This is called estimation.

For surveys, estimation is usually executed via weighting, whereby each sample member is assigned a weight, which can be thought of as the number of cases in the population (the number of similar businesses on the sampling frame, for example) that that sample member (a particular business surveyed, for example) represents. The calculation of the final statistics must then take account of the weights.

data confrontation
Comparison of multiple sources of data about the same entities or phenomena, usually as an attempt to assess coherence.

Establishing the appropriate weights to gross up the data from the sample to obtain aggregate statistics is a crucial task. In many cases, however, just as important is making an evidence-based judgement about which sources of information should be used to compile the final statistics of interest. There may, for example, be more than one survey that provides information on a particular item. Or there may be administrative or other non-survey information available. What this means is that carrying out data confrontation is an important part of the process. This involves establishing:

Problems like this are subjective by definition. Resolving them calls for a degree of judgement, albeit a judgement made as transparently as possible. Having done so, we arrive at the statistics users are looking for.

How it’s done Changing data sources in the compilation of estimates of UK GDP

The compilation approach for GDP involves the combination of a large number of data sources, both from sample surveys and administrative data, and at varying frequencies. The GDP(O) Data Sources Catalogue describes the data sources used for the current price, price deflators and constant price estimates for the different product and industry classifications.

In compiling early estimates of GDP, the first data available for a particular quarter is likely to be about the output of the economy. But increasingly, data will then become available about spending in the economy. This data will probably give a more reliable guide, so relatively more weight is placed on these expenditure estimates and relatively less on output measures, perhaps leading to revised and hopefully more accurate estimates of GDP.

Later, information becomes available about the incomes generated in the economy. This is probably the best available guide to what is happening to GDP. So weight will increasingly be placed on this data and relatively less on the output and expenditure data.

Data sources are regularly reviewed and updated as part of the annual Blue Book process. Currently, the increasing availability of relevant administrative data, such as information about incomes and spending obtainable from PAYE and VAT data, opens up opportunities for better and/or more timely data to help compile reliable GDP statistics.

A1.4 Administrative data

Surveys have been used as the main sources of economic data for many decades, stretching back to the Second World War or even earlier. They are tried and tested and deep experience has been built up in their use. But as we have seen, there are many tricky issues involved in their use. They are also labour-intensive and time-consuming for both ONS and for respondents. Not surprisingly, therefore, increasing attention is being paid to alternative sources of data.

Administrative sources are characterised as being those collected for some administrative, official or government purpose that have been collected not for the primary purpose of statistical production, but from which statistics have subsequently been produced. Examples of administrative data include records of tax receipts and benefit payments.

The theoretical framework for the production and usage of administrative data is currently less developed than that for traditional surveys. Nevertheless, failure to make use of the potential of administrative information would be a major opportunity missed.

A1.4.1 Uses of administrative data

There are a number of ways in which such administrative data can be used. One is as a direct source of information. Indeed, there is a long history of the use of administrative data in the compilation of economic statistics. Information about the government sector, for example, around a quarter of the total economy, relies on such information. So does information about the public finances, in other words, public spending, taxes and borrowing. In many instances, administrative sources will provide the only information available.

In other cases, information will be available relevant to areas also covered by sample surveys. In that case, data confrontation can be carried out to assess to what extent the survey and administrative sources are corroborating each other, or, if not, what weight should be placed on each.

Administrative data sources can also be used to help the compilation process, even when surveys provide the main source of information, for example in helping to inform the design of the sampling frame.

Use of administrative data, where possible, has great advantages. The data exists, so there is no need to collect it by means of an expensive survey and no burden on respondents. In many cases, it will be available more quickly after the time to which it relates than would be possible from a survey. It is likely to cover something close to the whole dimension of interest, so that all of the issues associated with compiling reliable information from samples is avoided.

That said, there are offsetting costs associated with the use of administrative data. The data need to be acquired, transmitted, stored and processed, and probably in much larger quantities than survey data. The data themselves may not measure precisely what is required for the economic output, so some adjustment may be necessary, and the data collection process is not owned by the statistical organisation, so is subject to changes in scope, definition and even its continued existence, as the business, government or other body providing the information and their processes change.

Perhaps most important of all, people have legitimate concerns that use of administrative information should not prejudice commercial or personal confidentiality.

How it’s done Value Added Tax (VAT) data

An area where good progress has been made in using administrative data to supplement, and perhaps eventually replace, survey-sourced data is represented by use of VAT data to help measure turnover and expenditure across the economy.

  • Why the data is important. All firms with a turnover of £85,000 a year or more are required to register with HMRC for VAT purposes. Those with turnover below this threshold can register voluntarily. Approximately eight million VAT returns are made to HMRC each year. Each of these returns gives information about the corresponding firm’s turnover which can be aggregated at whatever level is desired. Accordingly, this represents data from an enormous number of sources. By way of comparison, the ONS’s main business survey, the Monthly Business Survey, is sent to around 30,000 respondents.
  • How the ONS acquires the data. HMRC makes available to ONS the data represented by the VAT returns each month, under a Memorandum of Understanding designed to maintain the confidentiality of the material. This data set is stored in a secure and confidential environment and only accessed by individuals with a business need to analyse the data.
  • The uses of the dataset. It is used to supplement the Monthly Business Survey and Monthly Survey of Construction Output data, covering a wide range of industries. These data are also used as a source of turnover in a range of economic statistics, including the calculation of the Index of Production, Index of Services, Index of Construction and the output measure of gross domestic product (GDP(O)).
  • The key advantages of this administrative data. These are size, and detailed coverage. The information accompanying the VAT returns allows turnover not only to be aggregated, but also broken down by type and sector of activity, size of the firm and location. This means that it can be used to supplement and corroborate traditional survey information. But it can also be used as the basis for new and more detailed economic indicators. For example, it opens up the prospect of developing new measures of regional and lower level locational activity, which would not have been possible because of the sparsity, at subnational level, of survey-based information.
  • Challenges in the use of VAT return data. One issue is the large computational burden of dealing with and manipulating such large data sets. However, with modern computational capacities, this is not a major barrier to progress. More significant is dealing with the timing issues. The Monthly Business Survey has the advantage of clearly defined monthly returns. By contrast, large firms make VAT returns on a monthly basis. A few smaller ones do so on an annual basis, but the vast majority – around 90% – on a rolling quarterly basis. Converting this pattern into a representative monthly indicator constitutes a significant technical challenge. In addition, of course, not all activity in the economy is subject to VAT at all.

On balance, however, the potential advantages of this VAT data source make the agenda one well worth pursuing. In making use of any administrative data, the cardinal rule is to be aware of both the advantages and potential difficulties, and to ensure that ways of mitigating the latter are in place.

ONS Resource

The report Quality assurance of administrative data (QAAD) report for Value Added Tax Turnover Data describes properties of the VAT dataset. It identifies potential risks in data quality and accuracy as well as details of how those risks can be mitigated.

A1.4.2 The UK Data Science Campus faster indicators

ONS Resource

More detail on this work, along with published datasets, are given in Research Output: Economic activity, faster indicators, UK: April 2019.

A further example of the use of alternative data sources is provided by the work of the UK Data Science Campus to create a set of fast indicators of UK economic activity. The object is to compile indicators of movements in economic activity that are available before traditionally sourced and compiled statistics can be published.

For this purpose, they use the VAT dataset, along with ship tracking data from automated identification systems (AIS), and road traffic sensor data. These fast indicators rely on a range of administrative and also big data sources. The latter are discussed in more detail in the next section.

A1.5 Big data

big data
Large and often unstructured data sets that cannot usually be handled by traditional statistical processing and techniques.

Big data is a relatively new phenomenon, reflecting the advances in processing and dissemination capabilities that have had implications for society as a whole. The techniques of using big data to underpin economic statistics are also new.

Big data, as a concept, is often characterised by ‘the four Vs’: volume, velocity, variety and veracity. It is characterised by (very) large data sets, which are often unstructured or contain near-real-time data, and might have been obtained via any of a variety of means.

Examples include mobile phone tracking data, scanner data from supermarkets, social media posts, and information scraped from websites.

ONS Resource

The ONS big data team summarise recent developments in this area.

The ability to handle, make sense of and extract useful information from such sources has only recently become possible, with advances in data storage, access and computing processing power, combined with new data science analysis techniques, such as machine learning. There is clearly much potential to exploit these sources, and research and development of experimental statistics is underway.

A1.5.1 Official statistics based on big data

web scraping
Collection of data from websites, typically using automated tools or software.

There are few official statistics based on big data so far, although this is changing.

This 2018 press release provides information on how the work on compiling market statistics will progress.

ONS Resource

National statistical offices have probably only begun to scratch the surface of possibilities for exploiting big data sources. A useful survey of future possibilities for use of mobile telephone data is an example.

The possibilities include not just the compilation of traditional economic statistics, but also production of a wider range of socio-economic indicators, relating economic indicators to indicators concerning health status, location, and other social dimensions. The Covid-19 pandemic has underlined the importance of such information.

A1.6 Further reading