Appendix 1 Sources for economic statistics

Gareth James and Craig McLaren

A1.1 Introduction

Throughout this book we describe and discuss statistics about the economy. Where does the information come from to compile these statistics? Collecting this data is one of the major activities of the Office for National Statistics. This appendix outlines how the ONS collects this information, and discusses some of the issues that arise.

ONS Resource

A comprehensive list of surveys produced by the ONS is available.

As an example, the Annual Business Survey (ABS) is a flagship comprehensive dataset which is compiled on an annual basis.

The sample uses approximately 62,000 businesses and covers most business sectors, collecting financial data from businesses’ end-year accounts, including turnover, wages and salaries, purchases of goods and services, stocks and capital expenditure. Greater detail on the methods underpinning the ABS) is also available.

A1.2 Data collection

A1.2.1 Sources

Official statisticians use two means to collect the information that is needed to compile economic statistics:

  • Surveys. Essentially, these are questionnaires sent to relevant economic agents – for example, companies or households – to ask them about the nature and extent of activities in which they are engaged. Traditionally, surveys have been the main source of information used to compile economic statistics.
  • Non-survey sources. These are increasingly used, and include administrative data, information collected by government or other bodies during regular operations. “Big data” – from the internet, satellite observation, or traffic sensors – is also being used increasingly. Some administrative sources have always been used, for example from government sources measuring the impact of its operations on the behaviour of the economy, or from returns made to government departments by oil and gas companies to measure North Sea activity. Recent advances in technology open up the prospect of greater use of these sources, meaning we can collect information more quickly and cheaply, not bother people with surveys (or, often, require them to complete them by law).

A1.2.2 Evolving user needs

Statistics, however carefully compiled and however high the quality, are of no use if they do not fulfil some need on the part of users. One way to think about this is whether statistics improve decision-making, in the sense that the outcomes of decisions for those affected would be better than if the information was not available.

In economic statistics, many decision-makers are within the public sector: the Bank of England determines the interest rate, the Treasury sets taxes and public spending. But there will be others, too. Companies make decisions about their operations and investment plans. Households make decisions on how much to spend and save. In deciding what economic statistics should be provided and how frequently, we need to keep in mind the needs of all of these agents.

Not every need for information can be met within available resources. Statistics agencies such as the ONS therefore assess and reassess these needs. So the ONS talks to the Treasury and the Bank of England all the time, and find that their most pressing needs often change over time. It consults the wider community too. The amount of active engagement with users has grown in recent years.

ONS Resource

The ONS runs a series of events, including its ONS Forum, for analysts, business economists, media, and academic users, to discuss its work.

User needs evolve due to policy changes or external events, so the ONS must measure new or updated concepts. One example is the measurement of trade statistics at a regional level. There has been a growing realisation that trade between different parts of the UK, as well as with overseas partners, is an important factor in understanding and determining regional growth, especially after Brexit.

Having assessed – or reassessed – user needs, the next step is to consider what sources of information might be available to compile the statistics to fulfil those needs. We need to consider:

  • the concept to be measured
  • the timeliness of the indicator: does it have to be available quickly to be useful, or would publications months or even a year after the event do?
  • how frequent does the published measure need to be: monthly, quarterly or annually, for example?

Traditionally, monthly and quarterly surveys have been used for recent and short-term movements in the economy; more detailed and complex annual surveys provide greater detail, but are available only later.

Non-survey sources are increasingly being evaluated and used where they meet the data requirements. In the UK, the availability of non-survey sources has increased in recent years due to the opportunities provided by data legislation (for example, Chapter 7 of the Digital Economy Act, 2017). The Independent Review of UK Economic Statistics (Bean 2016: 11) noted, “[ONS should] make the most of existing and new data sources and the technologies for dealing with them”.

A1.2.3 Trading off costs and benefits of collecting information

Just because an information source is available does not mean that it should be used. Surveys are often expensive to maintain. There are also the “compliance costs” in time and effort for the people who must respond to the surveys.

Non-survey sources are often cheaper than traditional surveys, since the information may already exist. But the costs may not be negligible either. Administrative data, for example, may need processing to yield the information that is wanted, or may not relate exactly to the statistics, so adjustments may need to be made.

Citizens often have legitimate concerns about the privacy of their personal or commercial information. So administrative or big data can be used only if there are adequate means to safeguard individual and commercial confidentiality.

We must make a judgement balancing the interests of users, available resources, keeping down respondent burden and other costs of collection. In economic terms, the important question is:

Does the marginal benefit of providing additional statistical information to users exceed the marginal cost of producing that information?

This underlines the importance of prioritising user needs. Providing statistics with low benefit that crowd out those with high benefit would obviously not be a good outcome. For data sources, we must find the least costly ways to collect information. This is enshrined in the Government Statistical Services (GSS) Guidance on Monitoring and Reducing Burden and in the Code of Practice for Statistics:

ONS Resource

The ONS provides Guidance on Monitoring and Reducing Respondent Burden as well as a general Code of Practice for Statistics.

“The suitability of existing data, including administrative, open and privately-held data, should be assessed before undertaking a new data collection.” (Practice v5.3)

A1.3 Surveys

Carrying out a survey, performed with random sampling (so sometimes called a sample survey), is a well-established and widely used approach. It has its own theoretical framework, developed over many years, allowing the production of unbiased estimates and indicators of uncertainty. Surveys allow specific data requirements to be met and controlled. But they come at a cost to both the body carrying out the survey and to the respondents who are surveyed.

Surveys come in different forms:

  • Non-probability sampling. Relatively quick and cheap to conduct; in-the-airport surveys conducted by interviewers choosing passers-by is one very visible example. These suffer from not being guaranteed to represent the wider population.
  • Probability or random sampling. More challenging and time-consuming to carry out, but it offers outputs that are theoretically more representative of the population. This is currently the backbone of official statistics.

Some are surveys of businesses in the UK, which ask about various dimensions of economic-related activity, for example, sales, purchases, investments, capital expenditure, number of employees, number of vacancies, wages and salaries, pensions, hour-worked, and number and location of sites (factories, offices, warehouses).

Other surveys collect information from individuals about themselves and their households. These include items such as information about labour-market activity (employment and self-employment), jobs, earnings, income, spending, wealth and assets, and travel and tourism.

The population census (every 10 years in the UK) provides detailed estimates of the population by demographic characteristics and location. Census outputs are widely used and can be used as benchmark indicators within economic statistics.

If we want to generate information of value to users from sample surveys, there are three main issues:

  • the design of the survey and sample
  • how to collect and process the data
  • how to use the data that has been collected to compile the statistics that are of interest, sometimes called “estimation”.

A1.3.1 Survey design

The starting point for the design of most surveys is likely to be reference to a register. A register provides as complete a record as possible of information relating to the area of interest. It is kept up to date on a regular basis to ensure it continues to be representative of the population. The register is often used as the basis for the sampling frame.

For many surveys intended to produce economic statistics, the relevant register will be one cataloguing information about businesses: who they are, what activities and products they are concerned with, size, location, and so on. In the UK, the ONS compiles and maintains the Interdepartmental Business Register for this purpose.

The Inter-Departmental Business Register is comprehensive and is used to help the sample design of most business surveys. The primary sources of information to compile it are Value Added Tax (VAT) registrations and Pay As You Earn (PAYE) registrations notified to Her Majesty’s Revenue and Customs (HMRC). These feed through to ONS, and after matching become Enterprises on the IDBR. A list of local sites for each Enterprise are also compiled after contact with the business identified as the Enterprise. Enterprises usually respond for themselves in entirety as a single Reporting Unit, but some are split into more than one Reporting Units, and it is Reporting Units that are sampled.

The IDBR holds additional information about each of the businesses, including structures, economic activity, employment and turnover, which is used to assist sample design.

To keep the IDBR up to date, an updating survey is sent to all large businesses annually, to medium-sized businesses every third or fourth year, and to small businesses from time to time, but less frequently. Coverage of the IDBR, in terms of all economic activity in the UK, is good, with about 99% of activity being captured. However, many of the very smallest businesses do not appear on the IDBR, as they are too small to register to pay VAT, and/or have no employees.

A1.3.2 Sampling and sample design

A sample is a subset of the population of interest, chosen or selected to be surveyed to collect information. For probability sampling to be possible, a sampling frame is required. The sample design needs to be specified, and a sample-selection mechanism used to choose the sample at random. The design chosen will need to reflect the purpose for which the information is being sought, for example, how frequently it is needed, and how precise the information collected needs to be.

  • Sample frame. This is a list of units (businesses or households, for example) in the population of interest from which a sample may be chosen at random. To enable sample design to be optimal for purpose, additional information about each unit must be present in the frame, such as measures of a business’s size (number of employees or annual turnover) or indicators that show its principal economic activity (industry). Contact details are required as well. As noted above, a statistical register is often the source of this sampling frame. The IDBR, for example, provides information needed about businesses and their characteristics.
  • Random sampling. In the simplest random sampling, all units within the sampling frame have the same probability of being selected to be in the sample. This is the most straightforward approach, but there may be reasons to depart from it. For example, in an industry with high concentration and a few large firms significantly determining the aggregate outcome, exclusion of one large firm from the sample under simple random sampling could lead to distorted results. Techniques exist to prevent this, so that large firms have a higher probability of being included in the sample than small ones. However, when the sample design allows unequal chances of selection between firms, samples must be weighted appropriately so they continue to be representative of the population of interest.
  • Sample size and precision. Use of random sampling allows the precision of estimates to be assessed. Such measures are usually reported as estimated standard errors, coefficients of variation or confidence intervals. These measures help form part of the quality diagnostics and measures of uncertainty for the outputs. For example, a 95% confidence interval for the true value being estimated means that the confidence intervals associated with 95 out of 100 possible samples would contain the true value, and five would not. Everything else being constant, a larger sample size will result in a more precise estimate. On the other hand, a larger sample size will inevitably be more expensive. Seeking additional precision may also result in less timeliness. Such issues have to be considered and a decision made about the optimal trade-off between precision, timeliness and cost.

A1.3.3 Data collection methods

The next question is how the data is actually going to be collected from sample respondents. There are two categories:

  • Self-completion. Business surveys almost always use a self-completion method. Traditionally these have been paper questionnaires, which respondents return to ONS for scanning using optical character recognition software, though web-collection with electronic questionnaires is becoming increasingly common. Telephone data entry, in which respondents key in data via telephones, has been used extensively for some shorter surveys, too, and some (typically larger) businesses are able to take advantage of special arrangements, sending their returns as spreadsheets via secure portals.
  • Interviews. To obtain information about households, interviewer-administered interviews are more usual. These may take place face-to-face in people’s homes, alongside many that take place over the telephone or, increasingly, online.
Method Positive attributes Challenges
Face-to-face interviews Best response rates
Greatest accuracy
Can run long interviews
Interviewer available for support and advice
Capture data digitally
Can lead to other activities (diaries, meet other respondents)
Most expensive
Less suitable for sensitive topics
Needs interviewer during work hours
Needs respondent to be available while interviewer working
Telephone interviews Less expensive than face-to-face
Response rates still good
Capture data digitally
Interviews shorter than face-to-face
Self-completion, paper questionnaires Least expensive to set up Lowest response rates
Need simple and short questionnaire (less so for business)
Need to capture data in second step
Depends on ability of respondent to complete correctly
Question ordering is fixed and may bias responses
Self-completion, internet Least expensive per sample
Options for help
Can randomise order of questions or route respondents automatically
Captures data digitally
Checks data when entered
High set-up cost
Biased sample (internet users)
Low response rates (if not mandatory)
Need simple and short questionnaire (less so for business)
Computer-assisted in field Captures data digitally
Checks data when entered
Can randomise order of questions or route respondents automatically
High set-up cost
Needs careful design

Figure A1.1 Data collection methods

Data collection methods

A1.3.4 Data checking

We now have a well-designed sample and we have collected data from those in the sample. But, in most cases, the data derived from surveys can represent a large number of data points. It would be dangerous to accept all of this information as valid without further scrutiny, so the first part of processing is checking the responses. Problems include:

  • Respondent errors. For example, a respondent may inadvertently have reported the numbers: the survey needs a company’s sales in thousands or millions of pounds, but the respondent gives a figure in pounds. Uncorrected, that error could substantially distort the results.
  • Non-response. If the respondent has simply failed to enter a value for the company’s sales, in terms of the above example, it would be foolhardy to assume this meant the company had no sales.

An example of a check might be comparing a response this month with the one from last month or last year to see whether the change looks plausible. Checks could also include looking at internal consistency, for example, does the total number of employees quoted in a return equal the sum of the returns of part-time and full-time employees? We need to find ways to deal with the problems:

  • Query the data with the respondent. This can be done if the data point is important and inexplicable.
  • Automated editing. If the effect of the potential error seems to be small and the cause clear, data that have failed validation tests are changed automatically. Perhaps after comparison with previous returns from the same company, the respondent has simply mistaken the units, in the way described earlier. Converting the return to the correct units will then do the trick.
  • Imputation. What if the problem is non-response? A survey may not be returned within the required deadline, or it may be returned but with parts of it incomplete. Theoretically the ONS has legal powers to compel respondents to give the required information, though these are used only as a last resort, and this may take too long. Imputation is one method of addressing this problem, by which a likely value is predicted for the missing data and that prediction used as the data for further processing. The prediction would be based on available information. For example, the missing value of sales for a particular firm could be imputed as the firm’s sales for the previous period but uprated by the growth of sales for similar firms. When the actual value of the firm’s sales becomes available, this value can then be substituted into the sample to replace the temporary imputed estimate.

A1.3.5 Classification

Often, users will have an interest not just in aggregate figures such as GDP or the overall level of employment, but in the breakdown of those statistics across one or more dimensions. They may want to know, for example, what is happening to the growth of output in particular sectors such as manufacturing or construction. Or they may want to know how employment is changing in particular sectors, or how it is changing broken down between males and females, or by age group. Particular returns from a survey can be given a code to indicate to which sector, age group or gender it is relevant. By selecting only those survey returns with a particular code, it is thus possible to construct final statistics relating to the various subaggregates as well as the statistics for the aggregate.

For this coding process to be possible, there needs to be a list of categories that divide the aggregate exhaustively into subaggregates. Codes can then be assigned, corresponding to where the return in question falls within this list. These lists are usually called classifications. To help international comparability, ONS generally uses internationally agreed classification systems. For classifying economic activity to particular business areas, ONS uses the Standard Industrial Classification (SIC). In classifying employment to particular jobs and skills, it uses the Standard Occupational Classification (SOC).

Standard Industrial Classification

The UK uses the Standard Industrial Classification (SIC) to categorise economic activity of businesses. The current version is called SIC 2007 (denoting its year of publication), and is a five-digit, hierarchical classification. The highest level of aggregation is the Section, which indicates broad sectors of economic activity that are denoted with letters, and each comprises one or more two-digit Divisions. The three-, four- and five-digit codes then denote Groups, Classes and Sub-classes respectively, which are progressively finer divisions of the type of economic activity.

The UK SIC is comparable internationally. The United Nations owns the worldwide UN SIC, and co-ordinates updates to that classification, and the UK SIC 2007 and UN SIC are comparable at the two-digit level. Classifications are updated regularly to reflect the changing composition of the economy.

The classification of each business on the Business Register is usually derived from information it supplies about the principal economic activity that occurs on each of its sites. A business may carry out activities that fall into a number of different SIC codes, for example if it has a factory, a warehouse, supplies some goods wholesale and has its own retail outlets. Rules are used to determine its dominant economic activity, to govern its classification.

Standard Occupational Classification

The UK develops and uses the Standard Occupational Classification (SOC) to classify jobs and skills into occupational groups, with expert input from the research community. It is based closely on a schema drawn up and owned by the International Labour Office. The classification is updated every ten years, in line with the UK population census. SOC 2010 is a four-digit hierarchical code, with each digit in the hierarchy representing a more finely defined occupational area.

A1.3.6 Estimation

Having worked through the steps set out above, at this stage we should have a reliable and unbiased sample and should have collected information from respondents in that sample of a quality with which we can be satisfied. But it still remains necessary to use the information provided by the sample to compile those final statistics that users are looking for and will find of value. This is called estimation.

For surveys, estimation is usually executed via weighting, whereby each sample member is assigned a weight, which can be thought of as the number of cases in the population (the number of similar businesses on the sampling frame, for example) that that sample member (a particular business surveyed, for example) represents. The calculation of the final statistics must then take account of the weights.

data confrontation
Comparison of multiple sources of data about the same entities or phenomena, usually as an attempt to assess coherence.

Establishing the appropriate weights to gross up the data from the sample to obtain aggregate statistics is a crucial task. In many cases, however, just as important is making an evidence-based judgement about which sources of information should be used to compile the final statistics of interest. There may, for example, be more than one survey that provides information on a particular item. Or there may be administrative or other non-survey information available. What this means is that carrying out data confrontation is an important part of the process. This involves establishing:

  • What sources of information are available?
  • Do they tell the same or a different story?
  • Which of the sources are likely to be more or less reliable?
  • On this basis, what weight should be placed on the available sources in compiling the statistics of final interest? On the evidence, should one source be preferred over all others or should some weighted compromise be established?

Problems like this are subjective by definition. Resolving them calls for a degree of judgement, albeit a judgement made as transparently as possible. Having done so, we arrive at the statistics users are looking for.

How it’s done Changing data sources in the compilation of estimates of UK GDP

The compilation approach for GDP involves the combination of a large number of data sources, both from sample surveys and administrative data, and at varying frequencies. The GDP(O) Data Sources Catalogue describes the data sources used for the current price, price deflators and constant price estimates for the different product and industry classifications.

In compiling early estimates of GDP, the first data available for a particular quarter is likely to be about the output of the economy. But increasingly, data will then become available about spending in the economy. This data will probably give a more reliable guide, so relatively more weight is placed on these expenditure estimates and relatively less on output measures, perhaps leading to revised and hopefully more accurate estimates of GDP.

Later, information becomes available about the incomes generated in the economy. This is probably the best available guide to what is happening to GDP. So weight will increasingly be placed on this data and relatively less on the output and expenditure data.

Data sources are regularly reviewed and updated as part of the annual Blue Book process. Currently, the increasing availability of relevant administrative data, such as information about incomes and spending obtainable from PAYE and VAT data, opens up opportunities for better and/or more timely data to help compile reliable GDP statistics.

A1.4 Administrative data

Surveys have been used as the main sources of economic data for many decades, stretching back to the Second World War or even earlier. They are tried and tested and deep experience has been built up in their use. But as we have seen, there are many tricky issues involved in their use. They are also labour-intensive and time-consuming for both ONS and for respondents. Not surprisingly, therefore, increasing attention is being paid to alternative sources of data.

Administrative sources are characterised as being those collected for some administrative, official or government purpose that have been collected not for the primary purpose of statistical production, but from which statistics have subsequently been produced. Examples of administrative data include records of tax receipts and benefit payments.

The theoretical framework for the production and usage of administrative data is currently less developed than that for traditional surveys. Nevertheless, failure to make use of the potential of administrative information would be a major opportunity missed.

A1.4.1 Uses of administrative data

There are a number of ways in which such administrative data can be used. One is as a direct source of information. Indeed, there is a long history of the use of administrative data in the compilation of economic statistics. Information about the government sector, for example, around a quarter of the total economy, relies on such information. So does information about the public finances, in other words, public spending, taxes and borrowing. In many instances, administrative sources will provide the only information available.

In other cases, information will be available relevant to areas also covered by sample surveys. In that case, data confrontation can be carried out to assess to what extent the survey and administrative sources are corroborating each other, or, if not, what weight should be placed on each.

Administrative data sources can also be used to help the compilation process, even when surveys provide the main source of information, for example in helping to inform the design of the sampling frame.

Use of administrative data, where possible, has great advantages. The data exists, so there is no need to collect it by means of an expensive survey and no burden on respondents. In many cases, it will be available more quickly after the time to which it relates than would be possible from a survey. It is likely to cover something close to the whole dimension of interest, so that all of the issues associated with compiling reliable information from samples is avoided.

That said, there are offsetting costs associated with the use of administrative data. The data need to be acquired, transmitted, stored and processed, and probably in much larger quantities than survey data. The data themselves may not measure precisely what is required for the economic output, so some adjustment may be necessary, and the data collection process is not owned by the statistical organisation, so is subject to changes in scope, definition and even its continued existence, as the business, government or other body providing the information and their processes change.

Perhaps most important of all, people have legitimate concerns that use of administrative information should not prejudice commercial or personal confidentiality.

How it’s done Value Added Tax (VAT) data

An area where good progress has been made in using administrative data to supplement, and perhaps eventually replace, survey-sourced data is represented by use of VAT data to help measure turnover and expenditure across the economy.

  • Why the data is important. All firms with a turnover of £85,000 a year or more are required to register with HMRC for VAT purposes. Those with turnover below this threshold can register voluntarily. Approximately eight million VAT returns are made to HMRC each year. Each of these returns gives information about the corresponding firm’s turnover which can be aggregated at whatever level is desired. Accordingly, this represents data from an enormous number of sources. By way of comparison, the ONS’s main business survey, the Monthly Business Survey, is sent to around 30,000 respondents.
  • How the ONS acquires the data. HMRC makes available to ONS the data represented by the VAT returns each month, under a Memorandum of Understanding designed to maintain the confidentiality of the material. This data set is stored in a secure and confidential environment and only accessed by individuals with a business need to analyse the data.
  • The uses of the dataset. It is used to supplement the Monthly Business Survey and Monthly Survey of Construction Output data, covering a wide range of industries. These data are also used as a source of turnover in a range of economic statistics, including the calculation of the Index of Production, Index of Services, Index of Construction and the output measure of gross domestic product (GDP(O)).
  • The key advantages of this administrative data. These are size, and detailed coverage. The information accompanying the VAT returns allows turnover not only to be aggregated, but also broken down by type and sector of activity, size of the firm and location. This means that it can be used to supplement and corroborate traditional survey information. But it can also be used as the basis for new and more detailed economic indicators. For example, it opens up the prospect of developing new measures of regional and lower level locational activity, which would not have been possible because of the sparsity, at subnational level, of survey-based information.
  • Challenges in the use of VAT return data. One issue is the large computational burden of dealing with and manipulating such large data sets. However, with modern computational capacities, this is not a major barrier to progress. More significant is dealing with the timing issues. The Monthly Business Survey has the advantage of clearly defined monthly returns. By contrast, large firms make VAT returns on a monthly basis. A few smaller ones do so on an annual basis, but the vast majority – around 90% – on a rolling quarterly basis. Converting this pattern into a representative monthly indicator constitutes a significant technical challenge. In addition, of course, not all activity in the economy is subject to VAT at all.

On balance, however, the potential advantages of this VAT data source make the agenda one well worth pursuing. In making use of any administrative data, the cardinal rule is to be aware of both the advantages and potential difficulties, and to ensure that ways of mitigating the latter are in place.

ONS Resource

The report Quality assurance of administrative data (QAAD) report for Value Added Tax Turnover Data describes properties of the VAT dataset. It identifies potential risks in data quality and accuracy as well as details of how those risks can be mitigated.

A1.4.2 The UK Data Science Campus faster indicators

ONS Resource

More detail on this work, along with published datasets, are given in Research Output: Economic activity, faster indicators, UK: April 2019.

A further example of the use of alternative data sources is provided by the work of the UK Data Science Campus to create a set of fast indicators of UK economic activity. The object is to compile indicators of movements in economic activity that are available before traditionally sourced and compiled statistics can be published.

For this purpose, they use the VAT dataset, along with ship tracking data from automated identification systems (AIS), and road traffic sensor data. These fast indicators rely on a range of administrative and also big data sources. The latter are discussed in more detail in the next section.

A1.5 Big data

big data
Large and often unstructured data sets that cannot usually be handled by traditional statistical processing and techniques.

Big data is a relatively new phenomenon, reflecting the advances in processing and dissemination capabilities that have had implications for society as a whole. The techniques of using big data to underpin economic statistics are also new.

Big data, as a concept, is often characterised by ‘the four Vs’: volume, velocity, variety and veracity. It is characterised by (very) large data sets, which are often unstructured or contain near-real-time data, and might have been obtained via any of a variety of means.

Examples include mobile phone tracking data, scanner data from supermarkets, social media posts, and information scraped from websites.

ONS Resource

The ONS big data team summarise recent developments in this area.

The ability to handle, make sense of and extract useful information from such sources has only recently become possible, with advances in data storage, access and computing processing power, combined with new data science analysis techniques, such as machine learning. There is clearly much potential to exploit these sources, and research and development of experimental statistics is underway.

A1.5.1 Official statistics based on big data

web scraping
Collection of data from websites, typically using automated tools or software.

There are few official statistics based on big data so far, although this is changing.

This 2018 press release provides information on how the work on compiling market statistics will progress.

  • The Australian consumer price index. The majority of the data for its construction still comes from direct collection of individual prices. But around a quarter of the index is underpinned by web scraping. Relevant websites are collected together and the price information taken by appropriate recognition software from these websites. This reduces the need for traditional physical collection of shelf prices or for surveys to collect the price data.
  • Telephone data in the Netherlands and in Estonia. These two countries (and others) use mobile telephone data to help compile and verify tourism statistics.
  • Satellite data. Earth observation data sources give information about the type and nature of land use. This has the promise to be able to replace – at least in part – the extensive and time-consuming physical surveying that has hitherto been needed for the Land Cover Maps (LCM) that give the starting point for compiling natural capital estimates (see Chapter 11).
  • Market data in the UK. Web scraping takes place for measuring the behaviour of the labour market (for example, job vacancies) and in measuring price movements. This approach to collecting price information provides opportunities for large scale datasets to be collected quickly and efficiently.

ONS Resource

National statistical offices have probably only begun to scratch the surface of possibilities for exploiting big data sources. A useful survey of future possibilities for use of mobile telephone data is an example.

The possibilities include not just the compilation of traditional economic statistics, but also production of a wider range of socio-economic indicators, relating economic indicators to indicators concerning health status, location, and other social dimensions. The Covid-19 pandemic has underlined the importance of such information.

A1.6 Further reading