Research and analysis

The UK Data Driven Market

Published 26 March 2024

Executive summary

The aim of this study is to define and estimate the scale and distribution of UK companies which are data driven as per the research definition. This refers to companies which use digital data as an integral part of their operations. The findings of this study are expected to support targeted policies on increasing data activity in the economy. It will also address the “value of data” question by estimating the economic contribution of data driven companies to the UK.

Key findings

  • It is estimated that there are 9,600 active, VAT registered data driven companies (DDCs) in the UK, of which 5,500 offer specialised data infrastructure or software development services and 4,100 offer more diverse data consultancy or management services.

  • As a collective, DDCs operate in most sectors of the UK economy. Majority of DDCs and particularly specialised ones, operate in the ICT sector, primarily registered as computer consultancy or business support companies.

  • DDCs are more likely to be based inside the London, South East and East of England regions than outside it. On average, around 5 in every 1,000 businesses in these regions is a data driven company.

  • DDCs generated an estimated £343 billion in annual turnover (6% of total UK turnover) in 2023, of which over 80% is generated by large DDCs inside the London, South East and East of England region.

  • DDCs employed 1.5 million people (5% of total UK employees) in all types of roles in 2023, similar to the digital sector which had 1.9 million filled jobs in 2022. Large DDCs are bigger employers than large businesses in general (90% of DDC employees are in large companies compared to 56% of total UK employees).

  • DDCs are estimated to contribute £84.9 billion (3.8%) in GVA to the UK economy through their total employment. In comparison, this is larger than the GVA contribution of the telecoms sector (1.5%) but smaller than that of all digital sectors (7.2%).   

  • Diverse non-specialised DDCs generate more in economic contributions than specialised DDCs, in terms of GVA, employees and turnover.  

  • DDCs operate in nearly all sectors of the economy and have overlaps with multiple related sectors such as AI and cyber security. It is not possible to isolate the impact of DDCs because of these substantial overlaps.

1. Introduction 

This study takes a novel approach to defining data driven companies through their website content. Although similar methodologies have been applied to estimate the AI and cyber security sectors, the scope of companies analysed in this study is larger. Data driven companies are not limited to a specific technology and therefore more difficult to define. The data driven companies in this study also differ from the data companies defined in the European Data Market (EDM) study.  

The EDM study estimates that around 197,000 or 7.2% of UK companies were directly involved in the production, delivery, and/or usage of data in the form of digital products, services, and technologies in 2022. These can be both data supplier or data user organisations. The 9,600 estimated data driven companies in this study can be considered a subset of these organisations, as they are defined by a specific taxonomy of data related digital products, services, and technologies.  

The analysis of this study can therefore be relevant for policies which aim to target data driven companies in the UK. It is reasonable to assert that these companies are more likely to be affected and/or adopt data policies. Further research is required to understand how much investment these companies contribute towards data.  

The study broadly aims to answer the following questions: 

  • What is a data driven company?  

  • How many companies in the UK are data driven? What is their economic contribution?  

  • Which sectors of the UK economy are data driven?

1.1 Sources  

This study uses the Data City (TDC) platform to build a dataset of data driven companies.[footnote 1] TDC is a platform which matches 1.6 million private limited companies registered on Companies House to their websites and scrapes the web content to classify them.[footnote 2] 

The dataset extracted from Data City is matched to the Inter-Departmental Business Register (IDBR), this gives the final dataset this study is based on.[footnote 3] The IDBR is a comprehensive list of businesses registered for Value Added Tax (VAT) and/or Pay As You Earn (PAYE). It is the main sampling frame for business surveys carried out by the Office for National Statistics (ONS) and other government departments. The IDBR is used to support statistical work across government.

1.2 Approach  

This study uses a multidisciplinary approach, combining policy, analytical and data science techniques. Our approach is broadly defined in these stages, with more detail provided in the annex.  

Figure 1: Stages to get the final dataset 

  1. Definition: literature review and policy engagement to arrive at a definition for data driven companies.  

  2. Initial inputs: workshop with analysts and policy representatives to create a taxonomy of keywords which support the definition. Classify the keywords into “narrow” and “broad” to provide a range which indicates the subjectivity of defining data driven activity by keywords.  

  3. Machine learning: use keywords to get a large population of 12,465 data driven companies from the Data City which match the keywords. Workshop with policy and analysts to pick out positive and negative classifier companies from this population. Use these classifiers to then train the machine learning algorithm to filter out further companies.  

  4. Data quality check: match filtered company population from the Data City platform to the Inter-Departmental Business Register (IDBR), to only keep companies which are active, and VAT registered in the UK. Additional quality checks were run on turnover and employee data. This left us with the final dataset of 9,600 data driven companies which are active, and VAT registered in the UK.

2. What is a data driven company? 

2.1 Definition 

Data driven companies are not defined by a formal Standard Industrial Classification (SIC) code.[footnote 4] It is difficult to capture data driven companies (DDCs) partly due to the intangible nature of data. For instance, companies do not clearly record the value or extent of internal data analysis. This leads to the full value of data being unobserved and is a key limitation to the market-based approach of data valuation

There is similar ambiguity around the definition of a “digital company”. To get around this, extant publications have used SIC codes to define digital sectors. However, SIC codes were last updated in 2007 and do not capture the extensive presence of digital technology in every sector. The same issue applies to data driven activity and therefore, this study defines data driven companies using company classifications rather than SIC codes.   

Around 85% of UK businesses handle digitised data, but not all of them will be a data driven company. This study is focused on companies (not businesses) which handle digitised data and meet the following criteria:  

(i) delivers a data related digital product, service or technology and  

(ii) uses digital data extensively 

Hence, the definition of a data driven company is one which delivers a data related digital product and uses digital data extensively (C in the figure below). This is the select population of UK companies this study intends to investigate, in alignment with our policy interests.  

To disaggregate the definition into keywords, the digital products, services or technologies which are related to data are taxonomised at a high level. According to the OECD, the following digital technologies are related to data generation, transfer, collection, storage, and use:  

  • The Internet of Things (IoT
  • Data analytics and AI 
  • Broadband networks 
  • Cloud and EDGE computing  
  • Blockchain and distributed ledger technologies 

Figure 2: Illustration of the data driven definition  

It is more challenging to capture a company’s extent of data use through keywords. The UK Business Data Survey (UKBDS) provides a reasonable proxy for this, finding that on average 1 in 4 (26%) UK businesses that handle digital data, other than just of their employees, use it to generate new insights. If generation of new insights through data use is taken as a proxy of using data extensively, approximately 1.2 million UK businesses could be using data “extensively”.[footnote 5]  

To proxy the extent of data use at a company level, the study uses website terminology which can indicate extensive data use (for example, cyber security or privacy enhancing technologies could indicate extensive data use). However, using keywords to indicate extent of data use is not an exact science. It is likely that in addition to capturing the intended population of companies in C (as in the figure above), some companies in B might also be captured.  

To mitigate some of this overestimation, keywords pertaining to data use are categorised as “narrow” and “broad” to provide a range in terminology. This allows C to be approximated using a range of companies rather than one absolute population, which is harder to justify due to the subjective nature of this study. Specialised DDCs are the best approximation of C (in figure above), associated to narrow data use keywords. Diverse DDCs which also capture C but may have some companies from B are associated to broad data use keywords. The full taxonomy of keywords is provided in the annex.

Figure 3: Specialised and diverse (all) DDCs

Source: The Data City, ONS IDBR Q3 2023.
Note: Figures have been rounded. These are active, VAT registered data driven companies in the UK. Unless otherwise indicated diverse DDCs also include the specialised DDC population.

The figure above illustrates the population of DDCs this study is based on.

2.2 Types of data driven companies  

Data driven companies which meet our definition of specialised and diverse are largely similar, with slight variations in how they operate. They can range from offering data storage, software or cloud services, to providing business analytics solutions. There is no salient difference between the business size distribution of specialised and diverse DDCs.[footnote 6]  

The companies in the specialised DDCs classification are more likely to be traditional IT software and/or hardware companies which operate in the ICT sector. The diverse DDCs may not primarily deliver a data related digital technology as their final output (or say that they do on their websites). They are more likely to operate in non-traditional data sectors, such as the Administration or Retail. These companies can range from a telecommunications company to a consulting firm.

The variation in a company’s operations means that they can justifiably be considered as both specialised and diverse. For instance, a consulting company can also provide cloud, data and AI services. This demonstrates the difficulty of identifying data driven activity based on a company’s activities and showcases the value of using a range. This range of terminology increases robustness when capturing companies using keywords.

Table 1: Top 5 services offered by specialised and diverse DDCs

Rank Specialised DDCs % Diverse (all) DDCs %
1 Data Infrastructure 22 Data Infrastructure 15  
2 Software Development 13 Energy Management 11  
3 Software as a Service 12 Agency Market 9  
4 Energy Management 12 Software Development 9  
5 Artificial Intelligence 10 Artificial Intelligence 8  

Source: The Data City.
Note: Companies can offer more than one service, so proportions do not sum to a 100%.

Overall, specialised and diverse DDCs offer similar services. This can include data infrastructure services such as data centre cooling, data centres, data storage, data hardware, software, and Trusted Execution Environments (TEEs).[footnote 7] Some services such as data centre cooling may have false positive matches where it was difficult to determine if a company functioned as a DDC or just offered auxiliary data services.  

Diverse DDCs are more likely to offer energy management services. This can include maintaining smart grids and producing smart meters as well as monitoring energy consumption using AI and IoT. While these services may clearly require extensive data use, the final output of these companies - as indicated by their registered SIC codes – can range from engineering to telecoms services.  

Agency market services are also more prevalent in diverse DDCs. These are predominantly companies which offer public communications, advertising or call centre services. Data use and analytics is often embedded within their services and part of their offer for clients.

2.3 The benefit of our methodology  

The main additionality of our methodology is identifying the spread of UK data driven companies and the fact that this is not limited to just digital sectors.[footnote 8] Extant publications which use SIC codes to define sectors predominantly attribute digital activity to the ICT sector. However, our research shows that data driven activity occurs in almost every sector of the economy.  

Approximately 35-45% of DDCs operate outside the ICT sector (see Section 3 for more on sectoral profile). Hence substantial data driven activity would remain uncaptured if only digital sectors were used to estimate it. In fact, 5 out of the 10 most common SIC codes for DDCs are not considered a digital sector in current publications. This validates the need to investigate DDCs horizontally across the economy.

Table 2: Top ten SIC codes of DDCs and their digital sector consideration

SIC codes  Description  “Digital” sector 
6202  Computer consultancy yes
6201 Computer programming yes 
7022 Business and other management consultancy no
6209 Other information technology and computer service  yes 
8299 Other business support service nec no
7311 Advertising agencies no
6311 Data processing, hosting and related yes
7112 Engineering and related technical consultancy   no 
7320 Market research and public opinion polling no
5829 Other software publishing  yes

Source: The Data City, ONS IDBR and DCMS Digital sector estimates.
Note: Definition of “Digital sector” obtained from DCMS Digital sector estimates.

This study addresses data driven activity across the economy by classifying companies, rather than sectors. However, in doing so, it highlights one of the key challenges of isolating the “data impact” – since it is not limited to a particular sector.  

Moreover, classifying through companies than through SIC codes also prevents DDCs which register under a not elsewhere classified (nec) code from being overlooked.[footnote 9] Since SIC codes were last updated in 2007, they may not capture emerging data driven activity. Indeed, the number of DDCs registered under a nec code almost tripled 5 years post 2007, compared to prior it. This may be indicative of emerging data driven activities not traditionally defined. Our methodology subverts this issue by using companies’ websites to classify based on real time activity.

2 .4 Interpreting data  

Our dataset is inherently biased towards companies which have a website and by design, larger companies.[footnote 10] This is not expected to skew results because data driven companies are more likely to be larger and have websites. However, it may lead to the exclusion of some data driven companies which do not have websites.  

It is not always explicit how a company uses data. For instance, an energy management company that primarily helps businesses secure a commercial contract for electricity might be doing so using a software they developed in house. This makes it challenging to discern and taxonomise the data related digital technology companies are using, which can make the dataset subjective to definition.[footnote 11]  

Since the population of data driven companies in this study is based on website content, it is prone to having false positives and negatives. For example, a data skill tutoring company may match the definition but not deliver a data related digital product.[footnote 12] False negatives can occur when companies do not match the definition despite delivering a data related digital product due to a poor-quality website.

3. Data driven sectoral profile 

3.1 Number of data driven companies  

The definition of data driven companies matches 9,600 companies in the UK. These companies have websites, are active, and VAT or PAYE registered in the UK.[footnote 13] While the total population of data driven companies in the UK can potentially be higher, our dataset can be a robust indicator for this. It can also be used to observe emerging data driven activity across sectors in the UK economy.   

Table 3: Data driven companies size profile comparison 

Size                   IDBR estimates (Q3 2023) Percentage DDCs  Percentage
Large (250+ employees) 25,300 1%         350   4%        
Medium (50-249) 44,200 2%         870   9%        
Small (10-49) 209,500  10%        2,300 24%       
Micro (1-9)  1,757,000  86%        6,000 63%       
Total  2,036,000 100%       9,600 100%      

Source: ONS IDBR, The Data City.
Note: Numbers have been rounded. Micro includes sole traders which are not reported separately as the number of data driven sole traders is negligible (<1%) and not representative of the full population.  

Unsurprisingly, the proportion of larger DDCs in our data is higher than the proportion of larger companies in the economy overall. This is expected, since larger companies are more likely to be more data driven and to have websites.[footnote 14] Of the 9,600 data driven companies identified, 5,500 (57%) meet the more specialised definition. There is little variation between the business size distributions of specialised and diverse DDCs.

3.1.1. Data driven company incorporation dates 

Data driven companies have been growing significantly in the UK since the early 1990s. This trend is consistent for both specialised and diverse DDCs. On average, 490 DDCs have been registered each year since 2012. Noticeable peak in 2018 (n=670) coincides with the Digital Economy Act and AI Sector Deal. Although, this is not sufficient to indicate causality, it shows an uptick in company registrations at the onset of these events.[footnote 15]

Figure 4: DDC registrations   

Source: The Data City, Companies House registered incorporation date 1909 - 2022.
Note: n=9,600 (n=5,500 for specialised)

3.1.2. DDCs sectoral distribution  

As a collective, DDCs operate in most sectors of the UK economy.[footnote 16] The IDBR registered primary SIC code is used to show this distribution. It is possible for a company to register under up to four different SIC codes in Companies House and nearly one in six businesses did so.[footnote 17] This is also true for DDCs as nearly one in five of them have more than one SIC code (similar for both diversified and specialised). Therefore, to prevent double counting a company in every SIC code it registers under, only the primary SIC code from IDBR is used. The SIC classification on the IDBR comes from several sources, not just Companies House. There are a series of priority rules on the IDBR which then select the source considered the best quality. The best quality being ONS Business Surveys such as BRES (Business Register Employment Survey) where the business activity is confirmed by the respondent. Other sources are HMRC VAT, Companies House and HMRC PAYE.

Figure 5: Sectoral distribution of DDCs 

Source: The Data City, ONS IDBR.
Note: A company’s primary SIC code is from the IDBR, n = 9,600.

Majority of DDCs classify under the ICT sector, which is also true for most digital businesses. However, DDCs are more widely distributed than digital businesses. Around one in four DDCs are in the professional services sector. This is one in three for more diversified DDCs which are not very specialised, for instance, consultancies and advertising agencies. In fact, almost 1 in 10 DDCs which are more diverse and not specialised, are more likely to be business support and employment agencies in the administrative sector.

Figure 6: Sectoral distribution specialised (left) and diverse excluding specialised DDCs             

Source: The Data City, ONS IDBR.
Note: A company’s primary SIC code is from the IDBR. Specialised n = 5,500, diverse excluding specialised n = 4,100.

Looking into sectoral distribution trends at the SIC code level, most DDCs are primarily registered as computer consultancy or business support companies. If the SIC code is taken as an approximation of economic activity, it would imply that specialised DDCs are more likely to offer computer related activities than diverse DDCs. On the other hand, diverse DDCs are twice more likely than specialised, to offer business consultancy services.

Table 4: Top SIC codes of DDCs 

Sector  SIC code  Specialised (% of n=5,500) Diverse excluding specialised (% of n=4,100)
ICT  6202: Computer consultancy 30  19
ICT 6201: Computer programming 20  13
Professional services 7022: Business and other management consultancy 8 16
ICT 6209: Other information technology and computer service 7 5
Administrative 8299: Other business support service nec 3 4

Source: The Data City, ONS IDBR.
Note: Four digit SIC code with the highest count of DDCs proxy most common economic activity in each sector. Only IDBR registered primary SIC codes for a company are used.

4. Geographic profile 

There are two sets of location data from Data City: all trading locations of a business, or the singular address a company registers under on Companies House. Mapping all trading locations may lead to a more even spread, as mapping registered addresses might show more head offices, possibly more often based in London. Registered locations of DDCs are compared to IDBR registered business locations in Section 4.1. All trading locations of a DDC are used to show the local authority spread of DDCs in Section 4.2.

4 .1 Regional density of DDCs  

Around 1 in 3 DDCs have a registered location in London. In comparison, only 1 in 5 UK businesses were registered in London in 2023. Therefore it is reasonable to assert that DDCs are more likely to be based in London that outside it.   

Compared to the regional distribution of all UK businesses, DDCs are more concentrated in London and the South East.

Table 5: Distribution of registered DDCs compared to all companies   

Region Proportion of DDCs (% of n=9,600) Proportion of IDBR businesses (% of n=2.7 million)
London 37  19
South East 19  15
East of England  10 
North West  10 
South West  6 9
West Midlands
Yorkshire and The Humber 4
East Midlands 4
Scotland  4
North East 2 3
Wales 2   4
Northern Ireland 1 3

Source: The Data City, ONS UK business activity size and location 2023.
Note: DDCs are companies and it is being compared to the total UK VAT/PAYE businesses.

Figure 7: DDCs per 1,000 businesses

Source: The Data City, ONS UK business activity size and location 2023.
Note: Number of DDCs per 1,000 VAT/PAYE businesses on the IDBR.

4.1.1 Sectoral activity of DDCs by region   

Looking into regional distribution by sectoral activity, there are more specialised DDCs in London, South East and the East of England, than outside it. On the other hand, diverse DDCs which are not specialised and primarily offer computer programming or other IT computer services, are based more outside of this region. This analysis can provide a starting point for considering targeted, sector-based support for further development of data driven activity.

Table 6: Top sectors of DDCs across regions 

                                                                                                                 
SIC code Outside London, South East and East of England Inside London, South East and East of England
Specialised (%) Diverse excluding specialised (%) Specialised (%) Diverse excluding specialised (%)
6202: Computer consultancy 28 20 32 19
6201: Computer programming 20 14 20 13
7022: Business and other management consultancy 7 14 8 17
6209: Other information technology and computer service 7 6 8 5
8299: Other business support service 3 3 3 4
n 2,000 1,350 3,500 2,720

Source: The Data City.
Note: the counts of each column may not equal 9,600 due to rounding.

For instance, policies targeted towards levelling up the density of DDCs outside of London, South East and the East of England, should focus on boosting data driven activity and data use of more specialist companies involved in computer consultancy.

4.1.2 Regional density by business size 

There are some differences in the regional distribution of DDCs across business sizes as shown in Table 7. There is a significantly lower proportion of Micro DDCs outside of London, South East and the East of England (22%) compared to overall UK businesses in that region (42%). On the other hand, smaller DDCs are more represented inside the London regions compared to the general business trends. Further analysis would be needed to investigate whether this is caused by a higher prevalence of startups around London.  

It has already been established that there are more large DDCs compared to the general business size distribution. Although it is not surprising to find most large DDCs operate inside London regions, it is worth noting that this is significantly different to the general business population which has an equal proportion of large businesses in and outside of London.

Table 7: Percentage of DDCs in each region by business size

                                                                             
Business size Outside London, South East and East of England Inside London, South East and East of England
DDCs IDBR DDCs IDBR
Micro 22 42 41 43
Small 8 7 16 6
Medium 3 1 6 1
Large 1 0 3 0

Source: The Data City, ONS UK business activity size and location 2023.
Note: Sole traders are not shown separately but they are included in the count of DDCs n = 9,600.

4.2 Distribution of trading addresses  

DDCs are registered in 364 out of the 383 local authorities in the UK. Due to this good coverage of the dataset and the fact that trading addresses of DDCs are mostly concentrated in London, it is reasonable to assert that data driven companies are more likely to be based in London than outside it.

Table 8: Top five local authorities DDCs operate in  

Local authority DDCs (% of n=9,600)
City of London 6                       
Westminster 5                       
Camden 4                       
Islington 3                       
Hackney  3                       

Source: The Data City.
Note: Note where a company is listed under multiple local authorities, it is counted more than once. Proportions are of DDCs with at least one of their operating addresses in a local authority.

Table 9: Top five constituencies DDCs operate in  

Constituency Proportion of DDCs
Cities of London and Westminster 11%                   
Holborn and St Pancras           4%                    
Islington South and Finsbury     3%                    
Hackney South and Shoreditch     2%                    
Bermondsey and Old Southwark     2%                    

Source: The Data City.
Note: Where a company is listed under multiple constituencies it is counted more than once. Proportions are of DDCs with at least one of their operating addresses in a constituency.

5. Economic contribution  

5.1 Turnover  

Data driven companies generate an estimated £343 billion (6% of total UK) turnover, of which over 80% is generated by large DDCs inside the London, South East and East of England region.[footnote 18] Despite making up just 4% of DDCs, large data driven companies contribute significantly to the economy, even in comparison to the overall contribution of large enterprises (57%). This is consistent with the AI sector publication which found that large AI companies generated majority of the AI revenue. Please note that IDBR turnover is indicative only because it is updated via administrative sources (HMRC VAT records) and ONS Business Surveys. It is recommended that the Annual Business Survey (ABS) turnover estimates are used as the main source of turnover information for detailed industry and geographical turnover comparisons. 

Table 10: Turnover by business size

Business size DDC Turnover (£ billion) % UK Turnover (£ billion) %
Micro (1-9)       4                      1     781                   13     
Small (10-49)     9                      3     769                   13     
Medium (50-249)   25                     7     1,037                  17     
Large (250+)      304                    89    3,444                  57     
Total             343                    100   6,030                  100    

Source: DDC turnover matched to ONS IDBR Q3 2023 turnover data. UK turnover from ONS.
Note: Sole traders are not reported separately but included. IDBR reports turnover at the enterprise level and not company level, so turnover may have been overestimated for individual companies.

Small, medium and micro (SMM) sized companies account for just over a tenth of the turnover generated by DDCs (£38 billion, 11%). This is considerably lower in comparison to the overall turnover share of SMM enterprises in the economy (43%) and can provide useful policy context. For instance, policies targeted to increase data driven activity in SMM companies can potentially balance the turnover contribution levels to mirror more that of the overall business population.

5.1.1 Turnover by specialised and diverse DDCs 

Segmentation of total turnover by type of DDC reveals that large, diverse DDCs which are not specialised, make up majority of total turnover (£194 billion, 57%). Much of this is generated by AdTech DDCs in the Administrative sector which offer advertising and marketing solutions through use of data analytics and AI tools.

Table 11: Turnover of diverse and specialised DDCs

Business size Specialised (£ billion) Specialised (% of total DDC turnover) Diverse excluding specialised (£ billion) Diverse  excluding specialised (% of total DDC  turnover)
Micro 3 <1 <1
Small  6 4
Medium 12 14 4
Large 111 32  194 57
Total  131  38  212  62 

Source: Data City, ONS IDBR.
Note: Numbers have been rounded so adding turnover in each size band will not equal total turnover in each size band.

However, turnover of specialised DDCs is also significant across each size band. In contrast to diverse DDCs, much of the turnover generated by specialised DDCs is by companies in the ICT sector providing data infrastructure services.   

5.1.2 Turnover by data driven sectors  

Overall, DDCs in the ICT sector generate the highest turnover (£141 billion, 41% of total DDC turnover). However, this is mainly driven by specialised DDCs which predominantly operate in this sector. Turnover of diverse, non-specialised DDCs are more spread out through sectors in the economy and less concentrated in the ICT sector. This matches the general sectoral distribution of this type of DDC as they are more evenly represented in the ICT and professional services sector.  

The turnover contribution of diverse non-specialised DDCs is largely driven by those in the administrative sector which primarily offer “other business support not elsewhere classified”. Specialised DDCs, on the other hand, generate most of their turnover from providing computer wholesale, consultancy or related services. Much of which is generated by tech giants such as IBM.

Table 12: Turnover by sectors

Sector Specialised (£ billion) Specialised (% of total DDC turnover) Diverse excluding specialised (£ billion) Diverse excluding specialised (% of total DDC turnover)
Administrative  12   4 103  30 
ICT  89 26  52 15 
Financial 0 <1  38 11 
Professional services 13  10
Wholesale and Retail  12 
All other sectors  35  10 2 <1 
Total  131  38  212  62

Source: Data City, ONS IDBR.
Notably, less than a 100 DDCs which are diverse and non-specialised, are generating £38 billion (11% of total DDC turnover) in turnover in the financial sector. These are mainly fund management companies which offer data driven financial services. This hints at the economic potential from increasing data driven companies in this sector.

5.2 Employees  

DDCs employ 1.5 million people (5% of total UK employees) across the economy in all types of roles.[footnote 19] This is indicative of the economic contribution of DDCs but not of their demand for data related skills. Large DDCs are bigger employers than large businesses in general (90% of DDC employees are in large companies compared to 56% of UK employees) - particularly those which operate inside the London, South East and East of England region (82% of total DDC employees). Like turnover, SMM companies account for a tenth of total employees working for DDCs.

Table 13: Employment by business size

Business size DDC employees (thousands) % UK employees (thousands) %
Micro (1-9)       17                       1     5,250                    17   
Small (10-49)     53                       3     4,080                    13   
Medium (50-249)   90                       6     4,093                    14   
Large (250+)      1,378                    90    16,807                   56   
Total             1,538                    100   30,230                   100  

Source: The Data City, ONS IDBR total employees of VAT/PAYE enterprises in the UK.
Note: Numbers may not add to total due to rounding. Around 100 companies do not have employee data, and these are excluded from the counts. Sole traders are not reported separately but included in the headcount.

5.2.1 Employees of diverse and specialised DDCs 

It is worth remembering that the split between specialised and non-specialised diverse DDCs in the data set is 57:43. However, the ratio of employees between them is 29:71, making diverse, non-specialised DDCs the bigger employers and the ones which generate more turnover.

Table 14: Employees of diverse and specialised DDCs 

Business size Specialised (thousands) Specialised (% of total DDC employees) Diverse excluding specialised (thousands) Diverse excluding Specialised (% of total DDC employees)
Micro 10  <1 <1 
Small  30 23 
Medium  50  3 39 
Large 353 23  1,025 67 
Total  443 29   1,095 71

Source: Data City, ONS IDBR.

5.2.2 Employees by sectors 

Unlike DDC generated turnover, which is mostly in the ICT sector, DDCs in the administrative sector are the biggest employers (despite only being 7% of DDCs). This is driven by large DDCs which primarily offer “other business support not elsewhere classified”.

Table 15: Employees by sectors 

Sector Specialised (thousands) Specialised (% of total DDC employees) Diverse excluding specialised (thousands) Diverse excluding specialised (% of total DDC employees)
Administrative  112 850 55
ICT 244 16  146  9
Professional services 44 3 77  5
Financial  <1 9 1
Wholesale and Retail  24  <1
All other sectors 19 1 8 1
Total 443 29 1,095  71 

Source: Data City, ONS IDBR.

Specialised DDCs employ more people in the ICT sector compared to non-specialised diverse DDCs. This is mainly driven by computer consultancy companies. Moreover, most employment by DDCs in the manufacturing sector is generated by specialised DDCs which primarily produce navigation equipment.

5.3 Gross Value Added   

The GVA contribution of DDCs can be indicated through multiplying published ONS GVA output per job with total employees in DDCs. Furthermore, GVA output per employee can be used as an acceptable proxy of productivity in DDCs. Since employee data includes more than just those in data-related roles, the GVA figures estimated here give an approximation of total GVA generated by DDCs from all employees in every type of role.

Table 16: GVA contribution of DDCs

Sector GVA output per job (£) Employees (Specialised, thousands) Employees (Diverse excluding specialised, thousands) GVA specialised (£ billion) GVA diverse excluding specialised (£ billion)
Agriculture           58,181                     0                                0                                              0.0                       0.0                                    
Manufacturing         81,397                     14                               4                                              1.1                       0.3                                    
Electricity           163,044                    0                                0                                              0.0                       0.0                                    
Water Supply          132,185                    0                                1                                              0.0                       0.1                                    
Construction          63,885                     0                                1                                              0.0                       0.1                                    
Wholesale and Retail  48,026                     24                               2                                              1.2                       0.1                                    
Transportation        45,584                     1                                0                                              0.0                       0.0                                    
Accommodation         28,659                     0                                0                                              0.0                       0.0                                    
ICT                   91,833                     244                              146                                            22.4                      13.4                                   
Financial             176,016                    0                                9                                              0.0                       1.6                                    
Real Estate           450,718                    0                                0                                              0.0                       0.0                                    
Professional services 61,481                     44                               77                                             2.7                       4.7                                    
Administrative        37,932                     112                              850                                            4.2                       32.2                                   
Education             45,442                     0                                1                                              0.0                       0.0                                    
Health                38,846                     0                                0                                              0.0                       0.0                                    
Arts                  32,794                     0                                1                                              0.0                       0.0                                    
Other                 43,564                     2                                1                                              0.1                       0.0                                    
Total                                             443                              1,095                                          31.8                      52.7                                   

Source: GVA figures are in 2022 current prices. GVA output per job from ONS GVA output per job and total GVA Table 19.  

Data driven companies add £84.9 billion in GVA to the UK economy (3.8% of total UK GVA) through their total employment.[footnote 20] Of this, majority is generated by diverse non-specialised DDCs which contribute £52.7 billion (2.3%) to total UK GVA, followed by specialised DDCs which contribute £31.8 billion (1.4%).

As expected, specialised DDCs generated the most GVA in the ICT sector and diverse non-specialised DDCs in the Administrative sector. Although there are comparatively lower number of DDCs in the latter sector, the high GVA contribution is driven by the DDCs in this sector being large employers.

To further validate the GVA contribution of DDCs, it is compared to published GVA estimations of similar sectors such as digital, telecoms, AI and cyber. There is significant overlap between these sectors which makes it impossible to isolate the impact of one over the other. However, overlaps can be inferred using SIC code breakdowns of each sector, where provided.  

Figure 8: GVA estimations of sectors and their illustrative overlaps  

Source: AI sector study, UK cyber security sectoral estimates, Digital sector estimates.
Note: GVA figures have been published in difference years depending on the publication date. Size of the circle is relative to GVA but not drawn to scale. Some data driven companies are outside the remit of digital sectors. Overlaps are illustrative only.

These overlaps show the importance of thinking of data beyond just sectors. The activities of DDCs transcend sectors and viewing data as a sector would underestimate its horizontal contributions.

5 .4 Productivity and efficiency  

In terms of productivity (GVA output per employee by a DDC), specialised DDCs are more productive (£71,800) than diverse non-specialised DDCs (£48,100). Specialised DDCs have a similar productivity to AI companies (£73,800), but lower than that of cyber security companies (£107,366). Moreover, productivity of DDCs overall (£55,200) is higher than the average productivity of non-financial businesses in the UK (£52,000).   

The overall GVA to turnover ratio is 0.25 for DDCs, which means that for every £1 of turnover, data driven companies generate 25p in GVA. In comparison, the AI and cyber security sectors generate 30p and 60p in GVA respectively for every £1 of revenue.[footnote 21]

Annex: Keywords by data value chain

                                                                         
Data value chain Data building Data transformation Knowledge Creation Production / Operation
Details extract, transform and load getting data into analytical format analysis, insights from information commercialisation of knowledge
Narrow (extent of data use) "web scraping" OR "IaaS" OR "PaaS" OR "SaaS" OR "encryption" OR NEAR("data" "feed", 5) OR "integration" OR "API" OR "data hub" OR "Third Party Providers" OR NEAR("text" "mining", 5) OR "sentiment analy"* OR "machine learning" OR "ML" OR "neural network" OR "privacy enhancing technolog"* OR "Large language model"* OR NEAR("geospatial" "insight"*) OR NEAR("proprietary" "platform", 5) OR "proximity based marketing" OR "route optimisation"
Broad (extent of data use) = Narrow + this row NEAR("data" "acquisition", 5) OR NEAR("data" "management", 5) OR "database" OR "raw data" OR NEAR("data" "stor"*, 5) OR NEAR("data" "linking", 5) OR NEAR("data" "transformation", 5) OR NEAR("data" "interoperab"*, 5) OR NEAR("data" "sharing", 5) OR "SMART Data" OR "business intelligence" OR NEAR("data" "visualis"*, 5) OR NEAR("data" "intensive", 5) OR "Big data" OR "data service delivery" OR NEAR("cyber" "solution", 5) OR "personal data store"* OR "location based marketing" OR "dashboard"
To indicate delivery of data related digital technology ("Internet of things" OR "IoT" OR "data analy"* OR "artificial intelligence" OR "AI" OR "broadband network"* OR NEAR("cloud" "provider"*, 5) OR NEAR("cloud" "comput"*, 5) OR NEAR("edge" "comp"*, 5) OR "blockchain" OR "distributed ledger technolog"*)
  1. This approach of using web scraping platforms to define an emerging sector has been applied previously for the AI and cyber security sectors.  

  2. TDC website matching accuracy and limitations: URL Matching - The Data City  

  3. Because TDC is real-time, data is regularly updated on the platform. Our data was extracted from TDC platform on 02/10/2023 at 13:48.  

  4. SIC codes capture economic activity which businesses often relate to their final output. Since data activity is more likely to be incorporated as part of the process rather than the final output, it is by design difficult to capture through SIC codes.  

  5. 81% of businesses handle digital data other than that of their employees, of which 26% generate insights (from UKBDS 2021 applied to the BEIS Business population estimates). This does not measure intensity of data use but is the only available proxy from available statistics.   

  6. Proportions of DDCs in each business size for the groups differed by no more than 1% according to IDBR data.   

  7. A Trusted Execution Environment (TEE) is a segregated area of memory and CPU that is protected from the rest of the CPU using encryption, any data in the TEE cannot be read or tampered with by any code outside that environment. Data can be manipulated inside the TEE by suitably authorized code.  

  8. The DCMS Digital sector estimates majority of digital activity to be in the ICT sector (SIC sector J).  

  9. Overall, nearly 10% of DDCs in our dataset registered under 47 different nec SIC codes.   

  10. Both the UKBDS and the Digital Economy Survey suggest that larger businesses are more likely to use and analyse digital data and make website sales. It is therefore reasonable to assume that larger businesses and companies are more likely to have websites than smaller ones.   

  11. There are several energy management companies are in our dataset which develop and use data driven digital technologies.   

  12. Other examples include online news articles, blog posts and recruitment companies.  

  13. These companies have been verified as active companies in the IDBR Q3 2023 dataset which includes VAT or PAYE registered companies in the UK. They are also registered on Companies House.   

  14. Larger businesses are more likely to acquire and share data with others according to UKBDS 2022.  

  15. The Artificial Intelligence Sector Study published by DSIT has similar observations.   

  16. Data driven companies operate in all UK SIC sections but B, O, T and U.  

  17. As of December 2023, over 910,000 of the 5.5 million businesses registered in Companies House have multiple SIC codes. Internal analysis by TDC.   

  18. Total turnover for UK enterprises in March 2022 was £6 trillion, obtained from the ONS.  

  19. Employee does not refer to only digital or data related jobs but total employees in DDCs. Employee data is from the ONS IDBR.   

  20. Total GVA is for 2022 in £ current prices. Source: ONS GVA output per job and total GVA Table 19  

  21. Note the GVA methodology of these studies is different and not directly comparable.