Value of Public Sector Data Estimate
Published 24 October 2025
Executive summary
This report provides an initial estimate of the total level of investment in public sector data assets and the potential value of allowing external access to UK public sector data for use and reuse. This follows from UK Government commitments within the Industrial Strategy to understand and treat data as a modern asset with economic, financial and societal value.
There is no consensus on the best approach to valuing data. Most methods can be categorised as cost-based, market-based or use-based, where cost-based methods are commonly used for valuing broad categories of data (DCMS 2021,[footnote 1] DSIT 2024[footnote 2]). Pollock (2008) sets out a cost-based approach for the value of increasing access to public sector information for use and reuse.
The public sector invests an estimated £30 billion in data assets per year, which have an expected potential value to the economy of between £15-200 billion. It reflects the potential value external users, including businesses and researchers, can generate in terms of new products and services by gaining access to free (or subsidised) access to previously inaccessible or paid-for public sector data. In economic terms, this additional value represents the deadweight loss recovered from providing (subsidised) access to public sector datasets.
This result is an initial conservative estimate of both investment in public sector data assets and their total potential value to the economy. Public sector investment in data is calculated using ONS public sector time use statistics and public sector salary data to estimate the direct costs of core data creation activities in the public sector. This does not include broader data-related activity, such as data protection, data compliance, and capital costs associated with data. Cost-based estimates, whilst common for broad valuations, are also likely to underestimate overall value.
This is an estimate of total potential value and does not separate potential additional value from value which is already realised or will never be realisable. Further work is needed to understand how much value has been realised and where it would be socially beneficially to increase data access. Policy decisions around data access would also need to be balanced against a case-by-case understanding of risks and costs associated with increasing data access.
Introduction
The public sector collects and processes huge amounts of data during ordinary business. Both the Industrial Strategy and AI Opportunities Action Plan identified ways for government to improve how it leverages these data assets in ways which support the UK economy. Initiatives such as the National Data Library aim to responsibly increase external access to public sector data to support greater innovation and improved productivity across the economy.[footnote 3] This work aims to estimate the total potential value to the economy of access to public sector data.
There are numerous examples of public sector data being shared externally for the benefit of the UK economy. This includes regional Open Data schemes such the London Datastore,[footnote 4] industry specific schemes such as the Rail Data Marketplace,[footnote 5] and government owned companies who sell public sector data assets such as Ordnance Survey.[footnote 6]
Understanding and articulating the value of public sector data has been an expanding area of research among economists and policymakers in recent years. This is particularly important for justifying investment to improve external access to public sector data. Funding and measurement issues have been highlighted as key issues for the UK’s state of digital government, as it is hard to forecast costs and demonstrate benefits of many digital and data-driven products in monetary terms.[footnote 7]
This is an initial estimate of the value of public sector data to the economy. It is informed by existing literature on data valuation methodology and demonstrates the scale of potential benefits from improving access to public sector data. It applies conservative assumptions, with an expectation that it will be updated as discussions around data valuation methodologies evolve.
Context and literature
Data definitions
Data is used to refer to a wide range of concepts. The broad definition of data, which is used throughout this paper, encompasses the entire value chain of data, from collection up to data intelligence. Whole economy estimates of data investment using this definition range from 10-12% between 2012 and 2020.[footnote 2]
Data is split up into more specific groups when measuring the size of the economy. The System for National Accounts (SNA), an internationally agreed framework for national economic statistics, split up the aspects of the data value chain. Data refers on to the recorded and stored information from observable phenomena. If that data is then organised and structured typically by combining it with software, it becomes a ‘database’. Currently, only databases are included in UK GDP statistics as a combined ‘software and databases’ asset type, though the latest framework includes a combined asset category of ‘data and databases’.[footnote 8]
Data is a significant component of ‘intangible capital’. Previous DSIT[footnote 2] and academic research[footnote 9] highlight the overlap between data and intangible capital. The combined ‘software and databases’ category was the largest area of intangible asset investment in 2022.[footnote 10] Meanwhile, other forms of intangible capital are increasingly reliant on data intelligence, for example, research and development, marketing, and financial products.
Characteristics of data
Data is difficult to value via traditional methods because it differs from traditional assets. For example:
Data is non-rival. This means it can be used by multiple users at once without depleting it. It also means that it can used for numerous applications, even if these are not known at the time of collection.
Data has externalities. This means there are side effects of data access that impact without being accounted for in traditional markets. Combining data generates more uses than individual datasets. This means that value can be greater than the sum of its parts, but risks can also rise disproportionately.
Data often has high fixed costs and low marginal costs. This means markets struggle to set a price that covers the high costs of production while reflecting the low cost of sharing with new users.
Data is an intermediate good. This means it often not used directly by consumers, but by businesses who in turn provide products and services to consumers. The value of the data is then derived from the value of these final products and services. As applications of data to create new products and services by businesses can be unknown in advance, so too can the value.
Methods for valuing data
The Value of Data report (2021)[footnote 11] outlines and assesses the methodological landscape. Three main categories of methods for valuing data which have been applied in the public sector, these are:
- Cost-based methods: valuing data according to the costs incurred to collect, store and analyse data
- Market-based methods: using the market prices of data or market valuations of companies which use data intensively
- Use-based methods: a broader group of methods which aim to estimate the value to businesses (in terms of profits or productivity) or to consumers (in terms of willingness to pay) of using data. The literature often refers to these methods, when applied to businesses, as “income-based” or “revenue-based” methods
Cost-based methods are commonly used to value assets in companies’ financial statements and in national accounts. They are best suited to very broad valuations of data (for example, across the whole economy), cases where the value of data assets is less context dependent and relatively stable over time, and cases where the data is not created as a by-product of economic activity. They are likely to provide a lower-bound estimate of the value of data.
Market-based methods are a reliable option for valuing data assets when there is a significant volume of transactions. However, market prices for data are often not available as data assets are often produced and analysed internally. This is particularly true for the public sector, though it is also common elsewhere in the economy.[footnote 2] It is also difficult to disentangle the value of data from other digital and intangible assets. These methods are therefore likely to provide an upper bound for the true value of data to an organisation.
Use-based methods are the most appropriate for assessing the impact of data policy changes because of their flexibility and ability to account for differences in dataset values at a more granular level. However, the current empirical literature remains underdeveloped and will need to grow further before use-based methods can be consistently applied to a range of data policies. Uses can also be difficult to identify, a problem which is exacerbated at scale. This makes them inappropriate for broad valuations.
Academic literature on the topic, such as Coyle and Manley (2022)[footnote 11] and Fleckenstein and others (2023)[footnote 12] use different groupings for methods but reinforce the conclusions that there is no single method in the literature which can be used to accurately and consistently express the value of data in monetary terms.
Method
Theoretical model
Pollock (2008) sets out a cost-based approach for the value of public sector information, which includes public sector data. The model expresses the difference in ‘surplus’ (value) generated in two different scenarios in terms of the costs of generating the asset and multipliers. The first scenario used sets the price of access at ‘marginal cost’[footnote 13] (that is, free access), so that revenues are equal to zero and fixed costs are paid by the data owner (that is, the public sector). The second scenario used sets the price to access the dataset at ‘average cost’, so that revenues exactly offset the fixed cost of creating the dataset. This second scenario is chosen so that value generated can be expressed in terms of costs, which are easier to measure for a broad range of data assets.
The value difference between the marginal cost and average cost scenarios is dependent on value generated by new users. There are more data users when prices are lower (marginal cost), and so the value generated is higher. The impact of the change in price on user numbers is driven by the ‘price elasticity’ of the data asset. The value to new users is then also scaled up by a ‘spillover’ parameter to reflect that data is often used to make new products and services which create value of their own.
In economic terms, the additional benefits represent the ‘deadweight loss’ recovered by providing subsidised (free) access to public sector datasets. Pollock (2008) then simplifies these benefits into the following equation:
The equation begins with ΔW, indicating a change in W. This is set equal to the fraction two-fifths. After the fraction, there are three variables in sequence: F, λ, and ε. All variables are multiplied together without any additional operators or parentheses.
Where:
- ΔW is the change in welfare (benefits) by providing free access to public sector datasets
- F is the fixed cost associated with generating the data asset
- ε is the price elasticity of demand for data which reflects the change in the number of users when moving from cost recovery to zero cost scenarios
- λ represents the positive spillover or multiplier effect, which reflects that data is frequently sold to intermediate firms who then create products and services which generate further benefits to consumers; as well as potential dynamic impacts of making data freely available. Without accounting for spillovers, prices as they are reflected in the demand curve will underestimate the true value generated
More detail on how this equation is derived is available in the technical annex.
Parameter estimates
The cost of generating public sector data assets is estimated based on the time spent by public sector employees on data-related tasks. The ONS time use in the public sector survey[footnote 14] provides estimates of how much time is spent on core data related activities across the Civil Service, Arm’s Length Bodies (ALBs), National Health Service (NHS), Education, Police, and Local Government. We then scale this estimate into cost terms using available data on public sector salaries and supplement it with data on data-related contracts procured by the Civil Service and ALBs.
Table 1 sets out the tasks classified as data-related for the purpose of this estimate. Differences between the civil service and ALBs and broader public sector bodies are based on data availability. Civil service and ALB data also include time use broken down by main and secondary activities, such that both are included in the estimate. Meanwhile, the public sector bodies include main activities only.
Table 1: Data task breakdown
| Core data tasks (Civil Service & ALBs) | Core data tasks (NHS, Education, Police, Local Government) |
|---|---|
| Data processing, data analysis, data science | Data processing, data analysis, data science |
| Creating or updating records, databases, case files or similar documents | Records management[footnote 15] |
| Data entry | |
| Completing or processing forms |
It would be possible to use a narrower definition of data-related tasks. However, restricting both analyses to just ‘data processing, data analysis, data science’ tasks would miss a significant portion of the data value chain for the public sector, namely data collection. For instance, creating or updating records databases and data entry is a key activity of many public sector staff which contributes to the quality and subsequent value of public sector data.
Time estimates are then scaled up using salary and employment data. Civil Servant and ALB headcounts, Full-Time Equivalent (FTE) employment, and renumeration rates are taken from Civil Service Statistics. Public sector body FTE are taken from the ONS public sector employment statistics, whilst renumeration is taken from the ONS Annual Survey of Hours and Earnings (ASHE).
Estimates of elasticity of demand for public sector data and spillovers from public sector data use are adapted from estimates set out in the original model (Pollock 2008). These are highly uncertain, which is reflected in the range of the final estimates. The low estimate assumes no spillover effects and unit elasticity of demand, whilst the high estimates assume high spillovers from increased data access with use of data highly sensitive to price. Further work is needed to robustly identify an updated range of elasticity and spillover parameters which would cover most public sector data assets.
Results and discussion
The public sector invests an estimated £30 billion in data assets per year, which have an expected potential value to the economy of between £15-200 billion. This represents the additional benefits generated by increasing public sector data access to the private sector via a hypothetical reduction in the cost of accessing data.
This estimate is intended as an estimate of the total potential value of public sector data to the economy. Three categories are identified as within this total estimate, but their relative size cannot be broken down using this method alone. These are:
Realised value of public sector data to the economy. This ranges from how businesses benefit from understanding the economic climate via economic statistics from the ONS[footnote 16] to how travel apps use data from the Department of Transport[footnote 17] and the English wine industry uses Environmental Agency LiDAR data.[footnote 18]
Unrealisable value of public sector data to the economy. We can reasonably assume that some datasets will remain inaccessible in perpetuity. For some datasets, costs associated with greater access, such as national security risks and privacy implications, outweigh the potential benefits of data access, even if safeguards were applied.
Additional realisable benefits of public sector data to the economy. The remaining portion of the total estimate is comprised of datasets which could be the potential targets of new data access initiatives.
The methodology is a conservative estimate of potential value to the economy derived from public sector data assets. The estimate assumes a starting point where public sector data is available but at a cost, so the additional value is from users who benefit from lower costs. In practice, many public sector data assets are fully inaccessible. The potential value of moving from closed datasets to an average cost scenario, the hypothetical starting point, is therefore excluded from the estimate. It is possible to adjust the model to estimate the entire value of the consumer surplus in terms of cost. However, given the uncertainty in the elasticity and spillover parameters, the established method is preferred. It can be interpreted as a lower bound estimate of the value of public sector data.
Conservative assumptions are also applied to the cost estimate. Costs are confined to labour costs of core data creation activities, such as collection, management, and analysis. This excludes broader data-related tasks, such as data governance, data protection, and strategy development. It also excludes non-labour costs, such as software and servers, which are often significant for large data assets. A possible solution to this includes the addition of ‘blow-up’ factors applied to the labour costs[footnote 19], as has been done in previous data investment estimates.[footnote 2]
This method is also based on a hypothetical binary choice between two pricing approaches assuming all public sector data is already accessible. In practice, there are numerous considerations for both data access and pricing of access. For example:
Data access choices must balance any benefits to the economy against the risks and costs. These vary by dataset but include data subject privacy, public sector trust in the data owners.
Pricing decisions are not binary. There is a spectrum of prices including where the data owner charges above cost or facilitates partial cost-recovery.
Price discrimination is common in markets for data. This is where prices can differ based on who is using the data, how they are using it, and how much they are using it. A common version of this is ‘freemium’ models where some access is free and there are premium services which offer better quality data or API access. There are therefore multiple pricing decisions to make for each data asset.
Free access to data sharing can be difficult to maintain in a financially constrained organisation. Free access, where fixed costs are paid by the public sector data owner and funded through general taxation, can be difficult to justify even when benefits outweigh costs due to competing priorities within public sector bodies. The loss of users in paid models can be balanced against the need for sustainable funding.
Transaction costs are often significant. Transaction costs are incurred by users but do not generate revenue for data owners. Research commissioned by the Open Data Institute incorporate transaction costs into the Pollock (2008) model.[footnote 20] Pricing decisions can have implications for the transaction costs and, in turn, the impact of changing prices on the value of public sector data.
This is an initial estimate of the value of public sector data to external users. It is expected that the future estimates will evolve as discussions around data valuation continue and more information on public sector data use and value is collated. As such, feedback on the methodology should be sent to valueofdata@dsit.gov.uk.
Technical Annex: Pollock (2008) equation
The method described above is based on the following equation from derived from Pollock (2008)[footnote 21] and applied in Pollock (2011)[footnote 22] and ODI (2016)[footnote 20]:
The equation begins with ΔW, indicating a change in W. This is set equal to the fraction two-fifths. After the fraction, there are three variables in sequence: F, λ, and ε. All variables are multiplied together without any additional operators or parentheses.
Where:
- F is the fixed cost associated with generating the data asset
- ε is the price elasticity of demand for data which reflects the change in the number of users when moving from cost recovery to zero cost scenarios
- λ represents the spillover or multiplier effect, which reflects that public sector data is frequently sold to intermediate firms who then go on to create products and services which generate further benefits to consumers; as well as potential dynamic impacts of making data freely available
This is based on the economic concept of ‘deadweight loss’. Figure 1 below sets out the assumed market for data. This is a downward sloping demand curve, assuming that more users the cheaper the data is, and a flat supply/cost curve representing the cost of an additional data user is approximately zero.
This diagram presents a supply and demand model in microeconomics, with a horizontal supply curve at a price of zero. The horizontal axis represents quantity. The vertical axis represents price. The supply curve is a horizontal line at zero price, indicating that suppliers are willing to supply any quantity at no cost. The demand curve slopes downward from left to right, showing that as price decreases, quantity demanded increases. The market equilibrium occurs where the demand curve intersects the horizontal supply curve, at a price of zero and a positive quantity. A price control or market intervention raises the price above zero, reducing the quantity demanded. This creates a deadweight loss, shown as a shaded triangle between the demand curve and the supply curve, bounded by the quantity demanded at the controlled price and the equilibrium quantity. The deadweight loss represents the loss of welfare due to under-consumption: consumers who would have purchased the good at zero price are no longer doing so at the higher price.
In the average cost scenario, prices are set at P1, and there are Q1 users. In the marginal cost scenario, prices are set at zero, and there are Q2 users. The difference between the marginal cost scenario and the average cost scenario is as follows:
- The government must supply the funds to pay the fixed cost of the data asset
- Users gain a surplus equal to the fixed cost plus the deadweight loss (DWL)
Pollock (2008) then incorporates the spillover multiplier λ (described above), a distributional weight θ, and a parameter to reflect public sector users will be some proportion of the total g. θ is applied to user benefits to account for the value of public funds, since government funding could be used to provide access to data or for some other purpose. Pollock estimates this as ~0.8 based on the marginal utility of income and estimates of distribution of income. g is applied to reflect that, if public sector users must pay for access, then you’re replacing the subsidy (free access) funded from general taxation assigned to data owners with revenue funded from general taxation assigned to data users. These cancel out exactly such that the higher the proportion of public sector users, the lower the difference in how much public funding is used in the two scenarios, and the lower the societal benefits of the data owner recovering costs in the average cost scenario.
This leads to the following overall equation for estimating the change in welfare from the average cost scenario to the marginal cost scenario:
The equation starts with ΔW, indicating a change in W. It is equal to F multiplied by a grouped expression in parentheses. Inside the parentheses, there is first a negative sign followed by the product of two expressions: (1 minus θ) and (1 minus g). Then, a plus sign introduces a fraction: θ multiplied by λ multiplied by ε, all divided by 2. The entire expression inside the parentheses is multiplied by F.
Inputting θ = 0.8, the formula is simplified to:
The equation begins with ΔW, indicating a change in W. It is equal to the fraction two-fifths, followed by the variable F, which multiplies a grouped expression in parentheses. Inside the parentheses there is first λ is multiplied by ε. Then, a minus sign introduces a fraction: (1 minus g) divided by 2. The entire expression in parentheses is multiplied by F and scaled by the two-fifths factor.
The negative term represents the additional costs of using public funds to cover the fixed costs of production in marginal cost (free access) scenarios. However, Pollock (2011) [footnote 22] notes that often the burden of fixed costs will be shifted from related services rather than fully funded from additional public funds, and that it will be small relative to the other two terms (λ and ε). This is particularly true when spillovers are large, and demand is very sensitive to price. It is therefore neglected in Pollock’s subsequent analysis[footnote 22] and applications by the Open Data Institute[footnote 20], as well as in these estimates. This turns equation 4 into equation 2.
-
Data intangible capital and productivity report (DSIT, 2024) ↩ ↩2 ↩3 ↩4 ↩5
-
Handbook on measuring data in the System of National Accounts (United Nations Statistics Division, 2024) ↩
-
Data, Intangible Capital, and productivity (Corrado et al, 2025) ↩
-
What is the Value of Data? A review of empirical methods (Coyle and Manley, 2022) ↩ ↩2
-
A Review of Data Valuation Approaches and Building and Scoring a Data Valuation Model (Fleckenstein and others, 2023) ↩
-
For non-rival goods like data, the cost is the same regardless of the number of users, so the marginal cost is zero. ↩
-
Records management includes the following activities: Storing, filing or managing documents; Taking notes; Creating or updating records databases, case files or similar documents; Completing or processing forms; Creating or updating an appointment or booking; Data entry; Recording medical notes; Writing medical reports or treatment planning; Reading, reviewing or editing documentation or paperwork; Non-offender-facing activities ↩
-
The Value of Economic Statistics: Follow-up Report (Tong et al, 2023) ↩
-
Bubbling English wine boosted by new laser data (DEFRA, 2015) ↩
-
This is recommended in the UN handbook on measuring data [footnote 8] ↩
-
The economic value of open versus paid data (ODI, 2016) ↩ ↩2 ↩3
-
The Economics of Public Sector Information (Pollock, 2009) ↩
-
Welfare gains from opening up public sector information (Pollock, 2011) ↩ ↩2 ↩3