AI Skills for Life and Work: Patent analysis
Published 28 January 2026
This report was authored by Derek Bosworth and Jeisson Cardenas Rubio at the Warwick Institute for Employment Research, The University of Warwick.
This research was supported by the Department for Science, Innovation and Technology (DSIT) and the R&D Science and Analysis Programme at the Department for Culture, Media and Sport (DCMS). It was developed and produced according to the research team’s hypotheses and methods between November 2023 and March 2025. Any primary research, subsequent findings or recommendations do not represent UK Government views or policy.
Executive summary
Each patent specifications (‘patents’ for short) contains a detailed description of the invention for which intellectual property rights are sought. The associated administrative information documents the area or areas of technology it belongs to, and its potential applications or uses. The present research builds on the premise that areas of technology relating to artificial intelligence (AI), as defined by patent classes (a particular area of technology), are associated with different bodies of knowledge and that these differences will be reflected in the use of keywords (such as ‘neural network’ or ‘supervised learning’) in patent specifications.
This report explores patent statistics and trends from the United States Patent and Trademark Office (USPTO) dataset from 2014 to 2023. This dataset was used due to the US’ position as a major, if not the major, player in AI, and because the USPTO makes its data readily available.
By analysing the frequency of keywords, the research looks at how the number and make-up of AI-related patents have changed over time and the combinations of knowledge and skill sets associated with different types of AI technologies. Because patent applications occur at an early stage, before widespread adoption of the associated technology, patent statistics function as a leading indicator for how skills and knowledge are grouped, and what skills and knowledge are required for working with these AI technologies in different sectors.
Key findings
-
AI-related patents grew significantly as a proportion of the total number of patents, rising from 5.2% in 2014 to 20.3% in 2023. This includes inventions relating to the development of AI itself, and the application of AI to other areas of activity, such as surgery, vehicles and robotics. This confirms the increasing importance of AI and associated knowledge and skills for the successful development and deployment of future inventions.
-
The top four technologies (as measured by keyword counts) in both 2014 and 2023 were algorithms, artificial intelligence, neural networks and machine learning. These stand out in terms of frequency from other technologies, although some are catchall terms or are closely related. Deep learning, generative adversarial networks, chatbots, recurrent neural networks and variational autoencoders saw significant increases in their relative keyword rankings between 2014 and 2023.
-
Penetration rates of AI within the whole body of patent activity increased significantly over time – almost every AI technology area increased in absolute terms and relative to non-AI technologies. There were very few exceptions to this finding – the principal one was Data mining, which fell slightly in absolute terms and, therefore, fell more dramatically in relative terms.
-
The data captured the early period of the emergence of AI and there were many, large changes in the relative importance of different AI technologies (and, therefore, knowledge sets required) over the sample period – indicating that the growth of knowledge and skills is not uniform across all areas of AI. If anything, the data period covered is one of “churn”. However, tracking these trends as they settle can guide future labour market skill requirements and support long-term industrial strategy and workforce planning.
-
Specific areas of technology are characterised by distinct knowledge sets, differences in the relative importance of knowledge areas, and changes in relative importance over time. For example, Convolutional Neural Networks (CNN) saw high growth as a knowledge area and is now important within the totality of AI inventions. While the use of CNN in many technology areas is relatively small, in certain areas of Chemistry it is extremely important. This highlights the need for a holistic and nuanced approach to AI skills among researchers and policymakers, which considers the various types of AI technologies, their areas of application, and their relative importance across different areas, sectors and time.
-
The average number of technologies referred to per patent (measured by keywords counts) increased from around 2 in 2014 to over 3.5 by 2023. This implies that each technology was likely to be using more than one knowledge set. Analysing the interrelationships and groupings of these technologies shows how knowledge and skills cluster into “packages”.
-
While some knowledge packages relate to the development of key AI technologies, others are more concerned with the application of AI to specific areas (e.g. applications of AI in analysing health care data). A large minority of patents spanned quite different areas of technology (e.g. surgical techniques and robotics), with the more obvious need to combine AI and non-AI knowledge and skills. The analysis was only able to cover AI knowledge sets, but the importance of dual or multiple knowledge and skill sets will grow as the emphasis moves increasingly from the development of AI per se to the development of applications of AI. This implies the need to investigate the importance of combining core AI skills with more sector-specific skills to deploy AI technologies.
-
Concentrations of AI activity, as measured by the highest patent counts and penetration rates at the broadest level of patent classification, were found in five of the eight possible broad categories of technology. Two concentrations were the most developed: (i) Group G (Physics) involving upwards of six technologies; and (ii) Group H (Electricity) which currently involves three technologies. Three concentrations were emerging: (iii) Group A (Human necessities) two technologies; (iv) Group B (Performing operations; transporting) two technologies; and (v) Group C (Chemistry; metallurgy) three technologies.
-
These concentrations did not involve all the more detailed classes that fell in each area, and neither were the classes that formed the concentrations of the same degree of importance even within the same broad Group. The result is that the “packages of knowledge” required varied considerably both within and between these broad areas of technology. There will be other clusters of AI related activity in Groups D, E and F that develop in the future. In addition, work is needed to show how all the concentrations develop over time, what drives them and whether their evolution can be modelled and projected forwards.
-
Information held within machine readable patent data sets opens new avenues for mapping technological progress at the most detailed level. With minor further development, the present results have an immediate application in informing businesses, educational institutions and potential students about the likely knowledge bases that will be needed in the future. However, the data offer many other policy-oriented possibilities. For example, in addition to suggestions made throughout the report, it would be useful to establish a watch list of AI developments by the UK’s competitor countries.
1. Introduction
The information published in patent specifications (‘patents’ for short) offers insights about the evolution new technical developments in artificial intelligence (AI) and their application to practical uses. These uses span the scope of activities from healthcare to financial services. This report describes the exploratory use of this patent data to identify and monitor new AI technologies, and by extension, the knowledge and skills that will be required in the labour market to develop and use these technologies.
This section introduces patents and demonstrates how the technologies described by the patent specifications are related to the knowledge and skills bases, both now and in the future. It outlines the key terms used in analysis of the patent data, as well as how these relate to the AI job vacancy analysis, which forms a complementary strand of the AI Skills for Life and Work research.
1.1. Patents as a leading indicator
1.1.1. Focus on patents
Patent applications, which occur at an early stage before any widespread adoption of the associated technology,[footnote 1] can be used to identify the areas of technology where potential inventors are searching for technological breakthroughs. Therefore, patents may act as a bellwether for the emergence of new AI technologies and the knowledge and skills required to develop and apply them.[footnote 2],[footnote 3]
To be granted a patent, the associated ‘invention’ should be new, novel and involve an ‘inventive step’, providing a detailed description of the underlying invention and having a ‘technical application’ (such as a discernible practical use). Patents cannot be granted for inventions which are in the public domain (e.g. “common knowledge”).
As with most inventive efforts, financial returns depend, in part, from excluding other interested parties from exploiting the inventive idea commercially – even some open-source producers of AI technologies have adopted patent protection for various reasons.[footnote 4] As a consequence, patenting activity in the area of AI has grown enormously over the past 15 years, resulting in an extremely large wealth of well-organised and very detailed information. In addition, some of the largest companies in the world are involved, including Google[footnote 5], Microsoft, Amazon and IBM, amongst others. Suleyman (2023, p. 9) notes that:
Many of the world’s largest companies and wealthiest nations barrel forward, developing cutting-edge AI models and engineering techniques, fuelled by tens of billions of dollars of investment.
1.1.2. Patent data source
The present work adopts the United States Patent and Trademark Office (USPTO) database because the US is a major (perhaps the major) mover in AI. In addition, while data for other countries such as China are becoming more accessible, the USPTO make the data readily available in bulk downloadable form. Since 2013/2014, the data have been organised according to the Co-operative Patent Classification (CPC).[footnote 6]
1.1.3. Content of patent specifications and use of keywords
Patent specifications can run to many pages and include both structured data (for example, patent class / technology group) and unstructured data (for example, raw text). The key pieces of information that are central to the present work are:
- the date of the patent
- the text which describes the patent in considerable detail (e.g. sufficient detail that someone “skilled in the art” can replicate the invention)
- the patent class (or classes) to which the patent is allocated (i.e. the area of the technology according to the most detailed level of the CPC classification).
A team of specialists in each area at the United States Patent and Trademark Office (USPTO) assigns the patent application to the appropriate class at the most detailed level of classification (see Section 1.4 below).
There is no individual class within the patent classification that has the title ‘AI’. Nonetheless, many patents mention a component or group of components of AI technologies (such as Machine Learning, Neural Networks, etc.). Therefore, the approach adopted in this study has been to search the description of the technology using a set of words and phrases (“keywords” for short), and their variants. As a subset of these keywords is also used in the AI job vacancy analysis, they provide a mechanism for linking the patent data with the job vacancy data, thereby providing a degree of consistency between the two elements of the analysis.
1.2. Technologies, knowledge and skills
To understand why patent information is useful to a study of the labour market, it is important to consider the linkages between areas of technology, knowledge, and skill sets.
The level of required AI knowledge and skills varies considerably. Different areas of technology (for example, developing AI for use in areas of medical treatments and in telecommunications) will have different implications for the knowledge bases of the developers, innovators and diffusers of those technologies, and different levels of AI knowledge between these activities.
- AI inventors: knowledge and skills are heavily situated within the AI technology area, although they are likely to have other knowledge and skills. For example, AI research scientists, machine learning engineers, AI architects
- AI innovators: knowledge and skills are situated within the AI technology area, but also in the areas to which the AI technology is being applied since all patented inventions are intended to have a practical application. For example, software engineers, data engineers, data analysts who use AI
- AI diffusers: a lower level of knowledge and skills within the AI technology area is required than the level required among AI inventors or AI innovators, and potentially a lower level of general technical knowledge and skills overall. For example, business analysts, project managers, consultants
In what follows, it is important to remember that:
- each area of technology (as defined by the patent class) may require a different knowledge base to other areas. This knowledge base is represented by the specific keyword (or keywords) associated with the technology
- patenting activity represents a mix of technologies, which will impact on the overall mix of the knowledge base required to service them at the different stages (e.g. invention, innovation, diffusion and usage)
- different growth rates of these technologies – and the introduction of entirely new technologies – will impact on the overall extent and depth of the AI knowledge mix required
- some patents reflect the application of AI to other technologies (e.g. the application of AI to surgery) which may affect the knowledge and skill requirements
1.3. Defining AI “technologies” using keywords
Keywords are central to identifying the numbers and classes of the patents that can be designated as AI, as well as providing the link with the job vacancy analysis. Identifying which patents are AI allows us to identify the patent classes (e.g. the patent technologies that they are in), which can then be used to allocate the patents by sector of the economy.
The 41 keywords utilised in the patent search process are given in Table 1, along with their abbreviated labels (used to save space) and the number of patents in which each of the keywords is found.[footnote 7] Note, in later discussions involving these abbreviations, please refer to this table.
Note that each patent may include reference to more than one technology or keyword. The total sum across all keywords therefore exceeds the total number of patents designated as AI. On average, a patent contained 2.6 keywords (3.7 if algorithm and AI are included).
Section 3 explores the links between keywords, like which keywords are more likely to be found together. Such ‘knowledge packages’ vary by setting. For example, machine learning (ML) may be more likely to be found together with unsupervised learning systems (UnsupLrn) in one setting, but with supervised learning (SuperLrn) in another.
Table 1 Number of patents referring to the designated technology, 2023
| Technology | Label | Patent count | Technology | Label | Patent count |
|---|---|---|---|---|---|
| machine learning | ML | 39634 | variational autoencoder | VAE | 544 |
| neural network | NN | 32981 | generative models | GenMod | 523 |
| convolutional neural network | CNN | 16198 | computational intelligence | CompIntell | 390 |
| deep learning | DpLearn | 13929 | adaptive system | AdapSys | 300 |
| supervised learning | SuperLrn | 9344 | cognitive computing | CogComp | 276 |
| recurrent neural network | RNN | 7880 | smart system | SmartSys | 269 |
| computer vision | CompVis | 7590 | Text mining | TextMin | 199 |
| natural language processing | NLP | 6864 | autonomous agent | Auton | 191 |
| image recognition | ImgRec | 6396 | fuzzy logic system | Fuzzy | 171 |
| reinforcement learning | ReinfLrn | 6022 | knowledge-based system | KBS | 145 |
| unsupervised learning system | UnsupLrn | 5994 | large language model | LLModel | 122 |
| robotics | Robo | 3742 | reasoning system | Reason | 110 |
| generative adversarial network | GAN | 2311 | smart machine | SmartMach | 107 |
| machine vision | MachVis | 2281 | self learning system | SelfLrn | 73 |
| data mining | DataMin | 1706 | data-driven intelligence | DataDriv | 56 |
| chatbot | CB | 1180 | intelligent automation | IntAuto | 44 |
| expert system | ExpSyst | 1049 | emulation of intelligence | EmulInt | 43 |
| predictive analytics | PredAnlt | 861 | knowledge acquisition system | KnoAcqs | 40 |
| sentiment analysis | Senta | 735 | algorithm | Algo | 43018 |
| machine intelligence | MachIntl | 598 | artificial intelligence | AI | 29569 |
| data science | DataSci | 840 | Total patent count | 65852 |
Source: USPTO. Own calculations. Notes: 2023 is an incomplete year January to November.
1.4. Remainder of the report
The remainder of the report explores the way in which AI areas of knowledge have evolved from 2014 to 2023.[footnote 8] The underlying premise is that different areas of technology (based upon the patent classification) are associated with different bodies of knowledge and that these differences will be reflected in the keywords.
Each of the keywords itself represents a different area of knowledge, with implications for education and training as these technologies evolve, new areas of knowledge appear, and the balance between the need for different knowledge bases, as represented by the keyword changes over time. Of particular interest is an investigation of the extent to which these areas of knowledge, represented by keywords are used individually or in packages in different areas of technological activity. Hence, insofar as these “knowledge packages” are important, then how do they differ between areas of technology, how does the knowledge mix of one “package” differ from another?
The exploration of the data for 2014-2023 inclusive is undertaken in terms of patent counts and focuses on three main dimensions:
1) changes in keyword areas of knowledge based on patenting activity
2) the “knowledge packages” (based on keywords) associated with different areas of technology (e.g. by patent technologies)
3) the potential effects of technological developments on the required knowledge base over time
While work has been conducted on the growth of AI activity by sector, this is presented in another report in Work Package 4 dealing with future projections of labour market outcomes.
2. Measuring the evolution and growth of AI: patent counts based on keyword searches
All 41 of the keywords and phrases were used to identify AI related patents. As each keyword is potentially associated with a knowledge set, it is possible to trace how these change over time as the area of AI technology develops. These changes reflect the initial stages of development, with some areas of work starting early-on or later, beginning but then fading, while others grow continuously, some at extremely high rates. The picture is one of experimentation, with the successful areas expanding, though at different rates, and some not taking off – at least yet.
Various knowledge areas stand out in terms of growing importance, such as Deep Learning and Generative Adversarial Networks. While the precise results depend on which measures are applied, this is suggestive of the skills that may be in demand in the future. Furthermore, knowledge areas may have varying degrees of overlap, with implications for education and training programmes. For example, if Machine Learning and Neural Networks share a substantial proportion of their knowledge base, it may not be possible or desirable to teach them separately as the skills for one may depend on the skills from the other.
Analysis of patent data helps to show the emerging AI technologies and their interrelationships. Therefore, it may support long-term industrial strategies and skills planning to address the need for certain AI skills, at least until the relevant technologies begin to mature. The implications of the patent analysis for AI skills are summarised in Section 2.6.
Key findings
-
Rapid growth: Patent applications grew by a factor of 3.8 between 2014 and 2023, with a total of 438,000 applications in this period.
-
Breadth of AI: The rapid growth of AI activity and associated penetration rates (i.e. AI-patent count as a percentage of total patent count) are likely to have been caused by how AI has drawn upon existing technologies and its applicability to such a wide range of other areas of technology (e.g. Robots, Surgery, Vehicles, etc.). This has produced a need for AI related skills to be combined with more sector specific skills to produce practical applications.
-
Changing complexity of skills needs: If we assume that: (i) each of the keywords is associated with a different knowledge base; (ii) each area of application may mix AI with different technologies, which also requires both different knowledge and skills bases, then at the very least the current results show the enormous growth in diversity of emerging AI technologies.
-
Increasing complexity: AI is at a relatively early stage of development, with the emergence of new technologies and major changes in the relative importance of existing technologies.
-
Emerging relationships between areas of technology: Certain keywords have a close relationship with each other, i.e. they are more likely to appear with some keywords than others.
2.1. Growth of AI patents
This section explores the way in which AI technologies have evolved between 2014 and 2023 inclusive.[footnote 9] The analysis is undertaken using patent counts and focuses on the trends in each of the 41 keywords that are used to identify patents that were substantially AI in nature (see Section 1.4). While an attempt is made to report all ten years of data, where necessary to save space, comparisons are restricted to 2014 and 2023. The assumption, subject to some provisos discussed below, is that changes in each keyword have implications for the specialist knowledge or skill area required by somebody working with that technology.
Figure 1 and Table 2 provide an indication of the growing importance of AI within the total of patented technologies. The data indicate patents in which AI forms part or all of the subject matter (to be discussed below). Although the exact level of penetration of AI within the total might be debated, the general trend is clear. While only 5.2% of patents were associated with AI in 2014, this grew to 20.3% of the total by 2023 (the lower solid line in Figure 1), while there was a commensurate drop in the proportion of non-AI related activities (the upper solid line in Figure 1). The ratio of AI- to non-AI patents increased from 0.06 to 0.26 over the period (the dotted line in Figure 1). Over the 2014 -2023 period, there were 438,500 AI-related patents, compared with a total of 3.8 million (AI plus non-AI), with an average penetration rate of 11.4%. These trends can be seen in Table 2.
Figure 1: Proportion of overall patent applications relating to AI, 2014-2023
Source: USPTO. Own calculations
Table 2 AI and non-AI patent applications, 2014-2023
| 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| without AI-tech | Absolute number | 336.5 | 360.7 | 356.6 | 346.9 | 341.9 | 349.6 | 355.8 | 342.9 | 328.3 | 272.7 |
| % | 94.8 | 94.5 | 93.6 | 92.7 | 90.9 | 88.7 | 86.1 | 83.6 | 81.6 | 79.7 | |
| with AI-tech | Absolute number | 18.5 | 21.1 | 24.4 | 27.3 | 34.1 | 44.7 | 57.4 | 67.2 | 74.1 | 69.6 |
| % | 5.2 | 5.5 | 6.4 | 7.3 | 9.1 | 11.3 | 13.9 | 16.4 | 18.4 | 20.3 | |
| Total | Absolute number | 355 | 381.8 | 381 | 374.1 | 376 | 394.3 | 413.2 | 410.1 | 402.5 | 342.3 |
| % | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Source: USPTO. Own calculations.
This confirms the increasing importance of AI and associated knowledge and skills for the successful development and deployment of future inventions.
The top four keywords in both 2014 and 2023 were ALGO, AI, NN and ML, as shown in Figures 2a and 2b. Although their ordering changes slightly, these four stand out in terms of frequency from the other technologies (in keeping with their use as encompassing technologies (catchalls – see Section 2.2)). A simple correlation exercise suggests there is a relatively strong relationship between the two years (a Spearman rank coefficient of 0.70), although this is partly driven by a relatively small number of the most highly ranked technologies and the true degree of change is quite substantial. This suggests the core knowledge areas of AI remain consistent and have become even more important parts of the skill set required for the development of AI technologies.
There were also a considerable number of significant changes in the relative positions of the other technologies, as shown in Figure 2c. For example, DpLearn and GAN both rose 25 places in their ranking, and CB, RNN and VAE each rose between 15 and 20 places. Conversely, AdapSys, TextMin and DataMin each fell 10 or 11 places, while Reason fell 14 places in the ranking. These findings suggest that the relative importance of skills relating to deep learning and generative adversarial networks is increasing.
Note that the precise changes in the rankings reported here and below depend on the range of technologies covered in the various stages of the analysis. As it is quite difficult to find a way to illustrate the development of different AI keyword areas, the later analysis takes out subsets of the technologies.
2.2. Algorithms and AI
Two keywords warrant further scrutiny around their inclusion in the counts of keyword areas:
- algorithms (ALGO), because they can be used outside of AI-related applications
- artificial intelligence (AI), because of the tendency for the phrase to be used as a catchall for a wide range of AI-related activities
For the reasons set out below, the approach adopted is to largely remove both terms from the descriptive analysis.
In 2014, 67,000 patents referred to ALGO out of a total of 385,500 patents, of which only 16.4% could be assigned to the area of AI technologies. By 2023, the overall number of patents referring to ALGO showed a moderate rise of 88,000 across all patents, of which 51.6% could be assigned to AI inventions. The high level of algorithms from the very beginning of the period reflects their long-standing utility in storing instructions.
Algorithms are fundamental to AI technologies, and either the ability to understand or write algorithms – depending on what tasks the individual is required to do within AI – is almost certainly a useful if not essential skill. However, the presence of ALGO as a keyword is not sufficient to assign that patent to AI. Therefore, the present work only includes algorithms (ALGO) as an AI-related technology where there is at least one other AI technology present.
The presence of the term “AI” is often used, but not in all AI patent specifications (e.g. in just under 30,000 of the just under 66,000 applications). It is difficult to draw hard-and-fast conclusions about skills from this variable. In terms of patent specifications, defining a novel and inventive step in terms of the keyword AI alone is unlikely to be possible, especially in recent years as AI technologies have become increasingly complex – by 2023, only 7% of patents report AI alone as the keyword knowledge base.
Figure 2: Relative importance of keyword areas of knowledge by patent referral counts
Source: USPTO. Own calculations
2.3. Changes in machine learning and neural networks
As noted in Section 2.1, both Machine Learning (ML) and Neural Networks (NN) were large areas of reported activity. In 2023, ML was approximately 2.5 times larger than Convolutional Neural Networks (CNN) (the third largest technology) and NN was 2 times larger than CNN. To make the comparisons across all the remaining technology areas easier, these two “giants” are dealt with separately here. Their sizes are caused by their close relationship to one another, and the fact that they are the parents of families of more specialised technologies. Such relationships are currently difficult to deal with in terms of the vacancy data.
In terms of the close relationship between ML and NN, there were 39,600 referrals to ML in 2023 and 33,000 referrals to NN (see Table 2 above). In addition, there were 24,100 cross-referrals between ML and NN and a close correspondence between the sets of technologies to which ML and NN refer. A simple partial correlation between the number of references to each other technology shows a partial correlation of 0.98 (and a coefficient close to unity). The top ten referrals to other technologies by ML and NN are shown in Figure 3; the match between the two sets of technologies is extremely close. In addition, Deep Learning (DpLearn), Supervised Learning (SuperLrn), Unsupervised Learning (UnsupLrn) and Reinforcement Learning (ReinfLrn) – all dimensions of learning activity – are in the top 10 technologies.
Figure 3: Keyword knowledge referrals, ML and NN (ranked by ML, top 10), 2023
Source: USPTO. Own calculations
Further evidence of the relationships between the two are presented in Figure 4 (note all these comparisons omit AI and Algo). Figure 4 first shows the relative sizes of the references to ML and NN; while the ratio of ML to NN rose sharply in the early years (2014-2016), from below 0.96 to 1.3, this then fell away to slightly below 1.2 by 2021, before rising again, to finish at 1.2 in 2023. While NN was the technology with the highest patent counts, 3,500 (bear in mind the discussion of Section 2.2) in 2014, and grew to reach just under 35,000 by 2024, ML grew from 3,400 in 2014 (ranked second) to 42,000 in 2023 (ranked first). As a proportion of all referrals, both grew modestly over the period (ML by a factor of 1.4 and NN by 1.2).
The results are suggesting that the two are so closely intertwined that they may be representing the same knowledge set, rather than two distinct areas of knowledge. Given that machine learning knowledge would almost certainly have a grounding in neural networks, the education and training implications might be that much lower. However, this statistical result needs testing based upon more qualitative research.[footnote 10]
Figure 4: Machine learning and neural networks: dominant technologies?
Source: USPTO. Own calculations
2.4. Remaining keyword areas
The discussion now focuses on the remainder of the keyword areas of knowledge. For reasons of clarity, the discussion is also going to separate out some other subgroups of technology.
2.4.1. Emerging technologies?
There are a small number of patent classes which, whilst available, have zero entries until after 2014. This makes calculating rates of change for comparison over the whole period impossible. Figure 5 provides some summary information about three of the technologies that fall into this category, two of which show some evidence of being emerging technologies. These are Emulation of Intelligence (EmulInt), Generative Adversarial Networks (GAN) and Variational Autoencoders (VAE). All three first emerge in 2016/2017, three years after the CPC was adopted. In analysing these, note that ML and NNs have been omitted from the totals when calculation the proportions (otherwise they dominate the results).
Figure 5: Identifying emerging technologies?
Source: USPTO. Own calculations
Two of their growth paths in terms of numbers were quite remarkable (e.g. GAN rises from 12 in 2017 to 8,440 in 2023 and VAE from 1 to 2,011 over the same period). However, in terms of their proportions of the total referrals, they remained small (growing to only 1.8%, 0.4% and close to 0% respectively by 2023): these growth rates were not sufficiently fast to raise them up in the rankings of the different technologies over time. It is not being suggested that these are necessarily the major technologies of the future, but their example illustrates that such emerging technologies can be tracked from a very early stage using patent analysis.
2.4.2. Nascency: a period of churn
The discussion now turns to the remainder of the keyword areas with complete data, as shown in Figure 6 (excluding AI, Algo, NN and ML, and EmulInt, GAN and VAE). The data refer to the patent counts attributed to each keyword knowledge area (labelled “patent referral count”). Note the data are now organised in terms of rank importance in 2014, as shown by the monotonically upward sloping columns from bottom left-hand corner to top right. The more chaotically distributed set of columns relate to the corresponding patent counts in 2023.
As can be seen there is a great deal of change in terms of the relative sizes of the 2014 counts and the 2023 counts, consistent with significant shifts in the relative importance of the technologies over time. Again, bear in mind that a single patent can refer to more than one technology.
Figure 6: Patent counts attributed to each keyword area of knowledge, 2014 and 2023
Source: USPTO. Own calculations. Note omits AI, Algo, ML and NN.
By 2023, the largest patent numbers were for CNN (17,100), followed by DpLearn (14,700), SuperLrn (9,900) and RNN (8,300). What is also important, although less obvious, is that, in many cases across the various technologies, the change in the number of patents between 2014 and 2023 is almost identical to the 2023 absolute number (see, for example, DpLearn). In absolute number terms, there is only one negative change over all the technologies – that of Data Mining (DataMin) with a fall of 357 patents over the period.
In order to emphasis the degree of change taking place, Figure 7 summarises how the rankings in terms of “importance” measured by patent numbers also changed to a considerable degree. The columns show the rank in 2014, where lower values indicate higher rankings and higher values indicate lower rankings of each keyword area of knowledge. The solid line represents the change in rank between 2014 and 2023, with those improving their rank having values greater than zero and those falling showing negative moves. For example, DpLearn and GAN both rise by over 20 places over the period, while Reason and DataMin fall by over 10 places respectively. While six technologies change by one place or less, 14 fall by five places or more and eight rise by five places or more.
The picture is one of major changes in the relative importance of keyword areas of knowledge. Insofar as each keyword area may have differences in skill requirements, the results suggest a major need to monitor the levels and changes in their relative importance.
Figure 7: Absolute and relative changes in keyword areas, 2014-2023
Source: USPTO. Own calculations. Positive changes mean moving up the rankings and negative falling.
2.4.3. A period of growing complexity: range and distribution of keyword areas
This subsection focuses on the development of the technologies between 2014 and 2023, based on the keywords. Figure 8 shows the way in which the number of keywords, reflecting the range of relevant technologies, has grown. Note again that any one patent can refer to more than one keyword area of knowledge (so it may be associated with, say both Neural Networks and Robotics – a count of two). Consequently, the number of references to the different keyword areas of knowledge exceed the number of patents.
Figure 8 demonstrates the AI-patent count in each year (the first, lower set of columns) and the referral count (the second and higher set of columns). While patents grew by a factor of 3.8 over the period, referrals to keyword areas of knowledge grew from 37,000 in 2014 to 258,000 in 2023 – e.g. they were 6.9 times higher by the end of the period. By implication referrals per patent changed from 2.0 to 3.7.
Figure 8: Overall AI patenting activity and the importance of multiple referrals, 2014-2023
Source: USPTO. Own calculations
Figure 9 amplifies this result somewhat, by examining the distribution of the patent counts in 2014 and 2023. While the distribution of patents still showed a peak of two referrals per patent, the right-hand tail showed a considerably larger skew to the right. In other words, there was an increased tendency to refer to a larger range of technologies, but the distribution was more evenly spread across the wider range of technologies.
Figure 9: Numbers of patents by number of keyword area, 2014 and 2023
Source: USPTO. Own calculations
2.5. A note on the interrelationships between keywords
From the keyword results it is possible to examine which keywords tend to be linked together and whether they coalesce into separate groups of keywords. In other words, if any two words are chosen (e.g. because they appear to have a fairly strong link to one another), do they show a similar pattern in which they associate with some or all of the other 35 or so keywords. For example, if all the uses to which the keyword knowledge bases applied required identical mixes of ML, NN, UnsupLrn, SuperLrn, etc. then designing and education / training curriculum would be straightforward; if they are very different, then the design process is more complex.
If all the keywords are kept in (including AI and ALGO), then there are a small number of knowledge sets that dominate the 41 keywords. Figure 10 shows the top six keywords associated together across all keywords are (ALGO, ML, NN, AI, CNN and DpLearn), comprising 62.7% of the total. Thus, there are grounds for suggesting that they should be combined as core parts of courses in AI. However, two other considerations need to be considered. First, AI itself is more likely to be a catchall phrase that does not imply a specific knowledge set and ML / NN appear to be locked together, so it is not clear whether they should be treated as two separate knowledge sets. Just for the sake of argument, if AI and ALGO are dropped and ML / NN are combined (e.g. as a single subject), the dominance of the top 6 falls to 41.5%.
Figure 10: Potentially “core” knowledge sets
Source: USPTO. Own calculations
Second, taking the overall associations (e.g. an average across all keywords), hides the degree to which individual keywords are (or are not) associated with the other keywords. Figure 11 shows the range of outcomes by keyword, organised in terms of the percentage difference between the largest and smallest associations. Larger absolute percentages of associations have a greater ability to be associated with larger differences but bearing this in mind there are large differences in the extent to which keywords are associated with other keywords. In other words, while one keyword shares 22.5% of its associations with ALGO, another keyword has only a 5% association. This is showing that there is no fixed association between keywords, but potentially a great diversity. Obviously, the overall distribution of associations requires more investigation.
Figure 11: Difference in associations between keywords, by keyword
Source: USPTO. Own calculations. Note: cutoff difference set at 7 percentage points and lower-level examples omitted.
2.6. Implications for AI skills
The rapidity of development
The rate of AI development in patenting terms – and, by implication, in technology development – is probably unprecedented. While this is likely to eventually level off, its immediate skills implications for both producers of AI experts, specialists and implementers, and jobs and occupations impacted by AI is likely to be enormous. The starting point for considering skills is the rapid rate of growth of AI technologies from a generally very low base in 2014 (see Figure 8). Patent applications grew by a factor of 3.8 over the period, with a total of 438,000 applications over the 10-year period. This growth will have implications for the three AI personas and for job creation, reorganisation and destruction amongst user groups.
The reach of AI
The remarkably rapid growth by historical standards, in both overall AI activity and the associated penetration rates, is likely to have been caused by the way in which AI has drawn upon existing technologies (e.g. Algorithms) and its applicability to such a wide range of other areas of technology (e.g. Robots, Surgery, Vehicles, etc.). This has produced a need for AI related skills to be combined with more sector specific skills to produce practical applications. The application to different uses has almost certainly been a major influence encouraging an increasing range of AI-related technologies – which partly explains why there is no single sector (Beauhurst, Table 5.1, p. 39) [footnote 11] or patent class which can be identified with AI (see also Section 3 below). As a result, the present research has identified an increasing diversity of AI-related technologies and, thereby, AI specialities – branches and sub-branches of AI.
Changing complexity of skill needs
These changes suggest significant implications for our study of and understanding of skills needs. As a starting point, if we were to assume that: (i) each of the keywords is associated with a different knowledge base; (ii) each area of application may mix AI with different technologies, which also requires both different knowledge and skills bases, then at the very least the current results show the enormous growth in diversity of emerging AI technologies. In practice, our knowledge of the relationships between knowledge areas is very weak and more research is required to understand the implications for education and training programmes in AI.
Birth, evolution and complexity
The observed increasing complexity is partly the result of the early stage of development of AI, which has caused both the emergence of new technologies and major changes in the relative importance of existing technologies. Some “new” technologies, which may require new skills or at least new variants of skills, only appeared after the start of our sample period (EmulIntl, GAN and VAE fall into that category). Many existing technologies, which had been at or close to zero, grew enormously within the period (DpLearn, RNN and SuperLrn fall into that category). It is also clear that, with certain provisos, the relative importance of the various technologies has changed enormously over the period. If a fixed skill (or set of skills) is associated with each keyword area, then the skill mix of all AI has changed significantly over time and will continue to do so as new technologies continue to emerge and the knowledge bases in different areas develop at different rates.
Interpreting skills needs from the keywords
There are a number of potential issues in interpreting the skills implications of keywords without caveats: (i) AI probably had a more technical meaning at the birth of the technology, but has become a catchall for the area in general; (ii) Algorithms (ALGO) are not restricted to uses only within AI activities and have only been included here when the keyword appears with another keyword area of knowledge more clearly associated with AI technology[footnote 12]; (iii) ML and NN appear to be locked together in the sense, at least, that an understanding of one requires an understanding of the other or even that they are essentially referring to the same knowledge base.
Common results throughout this report, therefore, are that: (i) the keyword AI, by itself, is probably not very useful except as it has a very loose association with either technologies or skills; (ii) algorithms are clearly important, with relatively widespread reference to them in the context of AI, but, although individuals wanting to develop or work with AI would benefit from an understanding of them, their centrality in respect to AI requires further analysis (perhaps in the form of computing-related skills); (iii) ML and NN are closely related, but their relationships with one another require further investigation.
Relationships between keywords and their skills implications
In designing education and training policies, it has been shown that certain keywords show a close relationship with each other. The issues surrounding ALGO, AI, ML / NN have already been mentioned, but the range of relationships is much broader than that. In other words, a relatively small number of keywords tend to cluster. This tends to suggest that education and training courses might be focused on them, and other areas of AI knowledge might be seen as options within the programme of study. While this may be true the research has stressed some difficulties in operationalising it without further understanding of the trends in these relationships. For example, this analysis only undertook a brief foray into examining the extent to which the keywords were related to each other, and under what circumstances.
Links to areas of application
This section has initiated the idea that the outputs of applying keywords (such as Machine Learning or Neural Networks) are more difficult to understand than previously thought. In the job vacancy analysis, they are linked more with skills or occupations, while the present work suggests they are more closely related to areas of AI knowledge, which need to be linked with knowledge of other areas of technological knowledge based upon the applications they are associated with. AI patents are actually allocated to areas of technology through the patent classification system – these may be AI technologies (e.g. Neural Networks), or they may be non-AI (e.g. Surgery) or both. In some cases, two or more areas of technology may be covered by the same knowledge base, but more generally, each may be covered by slightly or, even, completely different knowledge bases. These considerations are central to the subject focus of education and training courses, and the depth of knowledge associated with whether the individuals are to be AI-experts, specialists or implementers (e.g. inventors, innovators or diffusers).
3. Changes in AI technologies and the knowledge base
While patent data offer many other avenues to explore, one dimension that has proved extremely productive is that of the linkages between areas of technology usage (e.g. surgery, vehicles, telecommunications, etc.) and the AI-knowledge bases associated with each of them (e. Robotics, Deep Learning, Image Recognition / Computer Vision, etc.). The implication is that AI cannot be treated as a single homogenous knowledge-base – the relevant knowledge sets change with the area of technology in which the work is being conducted or applied to.
One area of technology may require different knowledge sets to other areas (reflected by which of the keywords appear important). However, even where the knowledge set is the same, the relative importance of the types of knowledge within that set, will differ. For example, robotics at the most detailed level of disaggregation is absent from many patent classes but is a significant knowledge group in “Manipulators” (B25J).
These relationships between different technologies, linked to different areas of production, and knowledge bases is potentially crucial for both the design of programmes of study in AI, but also in guiding policies to stimulate the use of AI in particular applications.
Key findings
-
Concentrations of AI activity, as measured by the highest patent counts and penetration rates at the broadest level of patent classification (Group A, B, etc.), were found in five key areas of technology (listed below). The first two are the most developed, while the remaining three are smaller:
-
Group G (Physics): Computing arrangements based on specific computational models – often involving upwards of six technologies
-
Group H (Electricity): Telecommunications
-
Group A (Human necessities): Video games as well as surgery and medical applications
-
Group B (Performing operations; transporting): Vehicles and various forms of manipulators
-
Group C (Chemistry; metallurgy): Measuring or testing processes
-
-
These findings suggest sectors or subsectors where future demand for AI knowledge and skills may be. The findings also highlight the importance of combining core AI skills with more sector-specific skills to deploy AI technologies in these areas.
- The relative importance of specific AI skills or knowledge can differ in important ways both between and within patent class groups
3.1. Application of AI to areas of technology
3.1.1. Basics of the patent classification hierarchy
The patent system is organised by a hierarchical system (the patent classification) based upon technologies. Each patent class implies that a person skilled in that area will understand inventions assigned to it, in other words, have a sufficient knowledge of the technology. This means that there are now two, potentially overlapping types of knowledge, one concerned with the characteristics of the class (or classes – see below) – and one concerned with the type of AI knowledge represented by the keywords. Some technologies incorporate both types of knowledge and some only one, but the sample drawn for analysis includes both.
Each patent application is assessed by a team of experts who assign each invention to an area of technology at the most detailed level available, as described below. Each of these areas of technology is unique and is accompanied by a description of the characteristics that an invention would require to be assigned to it. As most inventions span more than one area of technology at this level of detail, the patents are assigned to more than one area at this most detailed level, designating a main area, second and third area (where appropriate).
Figure 12 illustrates the principle, where A1-A3 and B1-B2 denote the most detailed level and A and B are their parent groups, respectively. Now consider a single invention that is allocated to A3, this has no links to other areas of technology and simply forms part of the A family. B2 also represents a single patent with no links to other technologies (patent classes) and therefore belongs only to family B. Now consider two other patents, both of which involve A2, but are quite different. The first is allocated to A2 as its “primary class”, but also to A1 as its “secondary class” (e.g. the horizontal link A1<->A2), both are members of the A family, but the wider vertical arrow shows that A2 is the primary area. The second patent is also allocated to A2 as the primary subclass, there is a link to B1, even though this is not in the same family as A2 and is more likely to be linking more disparate technologies (such as AI and Surgery, which is an example that will reoccur later).
Figure 12: Hierarchical relationships within the patent classification system
This is an extremely important feature of the patent system. First, by using the 41 areas of knowledge, it allows the identification of which areas of technology are associated with which areas of AI knowledge (e.g. Structure learning, Unstructured learning, Robotics, etc.), and to what extent. Second, in doing this, it immediately becomes clear that different areas of technology (defined by AI experts in the patent system) are associated with different mixes or “packages” of knowledge, including both AI and non-AI dimensions.
By linking the keywords and patents, this research is identifying the different knowledge bases required by different areas of technology. The job vacancy data provide the information about the skills needed for the AI component (e.g. for Machine Learning, etc.). Other non-AI skill requirements lie outside of the scope of the present study.
3.1.2. Choice of level of disaggregation of patent technologies
To understand what level of detail at which the analysis takes place, it is important to purvey a slightly more detailed description of the patent classification system. Figure 13 shows the three-digit classification used throughout this report, e.g. G06N (in the highest-level group Physics, G (first digit); in the sub-group Computing, calculating or counting class, 06 (second digit); and in the class Computing arrangements based on specific computational models, N (third digit)).[footnote 13]
In essence, the Group data (A-H) show the most aggregate patent classification with 65,900 AI-related patents, the Sub-group show the distribution of the 38,000 Physics Group across each of the 12 sub-groups. Of these, 26,700 AI-related patents are in the Computing, calculating and counting Sub-group. Finally, the patent Class in which the 26,700 are sub-divided are shown as F-V, of which Electric, digital data processing is the largest, with 10,400 patents in 2023.
Figure 13: Illustration of the patent hierarchy using the selected AI data, 2023
Source: USPTO. Note: values shown relate to the numbers of patents in 2023 with complete information.
There are approximately 580 three-digit classes with reported AI patents in at least one year of the total time-period. However, this number is reduced for present purposes because there is not always a usable value for every year.[footnote 14]
3.2. Notable results by patent group, at the three-digit level
This section reports on exploratory work to identify which areas of technology are most highly associated with AI technologies at this early stage in the development of AI. In doing so, the discussion also begins to build the foundations for an analysis of which areas of technology are related to one another. Experimentation with the data suggested that there was some merit in organising the analysis to examine AI activities within technological Groups (e.g. within each of A-H in Figure 13)[footnote 15]. This seems a productive route because patents with a more similar technological focus tend to be linked more closely together in the classification. As a high level of detail is involved, the methodology is reported in Annex 1, while a brief discussion of the broad findings reported here.
What the research appears to be uncovering are relatively high- (and low-) AI activity groupings of potentially complementary classes of activity within a Group – which (for want of a better term) we have called coteries of inventive activity. Advanced groupings of this type can be found particularly within G and, to a slighter lesser extent H (see above), and emerging groupings in A, B and C.
3.3. The evolution and growth in AI-inventive activities
In this section, some of the key results are used to demonstrate the way in which AI technologies have penetrated the different patent classes. The penetration rate is the proportion of AI-related patents to total patents.
The patents are divided into four size groups, in which total applications (e.g. AI and non-AI) fall into the following categories in 2023: (i) <100; (ii) 100-249; (iii) 250-999; (iv) 1000 plus.[footnote 16] Note, while the following discussion refers to the top 10 or top 15 classes, in terms of penetration rates, it is not possible to include them all in the same figure and only examples are provided.
There are 343 three-digit classes in 2023 that have total patent counts (e.g. AI plus non-AI) less than 100. Nevertheless, these classes contribute over 4,700 AI-patents over the period 2014-2023 as a whole. Of these, B07C (Postal sorting; sorting individual articles, or bulk material fit to be sorted piece-meal, e.g. by picking), G01W (Meteorology) and G03H (Holographic processes or apparatus) made the largest contributions (over 150 patents each – over the period as a whole), with G08C (Transmission systems for measured values, control or similar signals) not far behind (147, not shown). These three technology areas appear to be emergent AI-classes (see Figure 14).
Of the group with 100-249 patents in 2023 (Figure 14b), nine are from the Physics group, of which six of them appear in the top ten of penetration rates for this group. G16C (Computational chemistry; chemoinformatics; computational materials science) (not shown) had zero entries up to 2018, but then over 45% in 2019, rising to nearly 69% in 2023 (this remains to be investigated). B61L (Guiding railway traffic; ensuring the safety of railway traffic) and the slightly more erratic penetration growth of B64F (Ground or aircraft-carrier-deck installations specially adapted for use in connection with aircraft; designing, manufacturing, assembling, cleaning, maintaining or repairing aircraft, not otherwise provided for; handling, transporting, testing or inspecting aircraft components, not otherwise provided for) are notable, with implications for two areas of transport. B66C (Cranes; load-engaging elements or devices for cranes, capstans, winches, or tackles) has the slowest growth in the penetration rate of the top ten of the 100-249 group.[footnote 17] After the top 10 for this group, the penetration rates for 2023 fall below 15%, but there are a further nine in double figures.
There is a total of 36 classes in the 250-999 patent size group in 2023 (see Figure 14c). Of these, 19 have penetration rates that rise to over 15% by 2023. C25B (Electrolytic or electrophoretic processes for the production of compounds or non-metals; apparatus therefor) and H01F (Magnets; inductances; transformers; selection of materials for their magnetic properties respectively) are the only three-digit classes to have negative changes in the penetration rate over the period. The G group (Physics) is strongly represented, with 19 entries, of which seven are in the top ten penetration rate group[footnote 18] and 10 in the top 15, plus G16B (Bioinformatics ICT), which enters in 2019 and reaches a penetration rate of 70.5 by 2023). The three remaining classes in the top 10 are H04M (Telephonic communication), H04S (Stereophonic systems) and A01B (Soil working in agriculture or forestry; parts, details, or accessories of agricultural machines or implements, in general), all of which reach penetration rates of between about 25% and 40%. A01B is interesting as it is from the Human necessities (A group) and relates particularly to agricultural uses.[footnote 19]
Finally, the largest patenting classes (1000+ patents in 2023) are shown in Figure 14d. Nine of the top 10 classes defined in terms of the size of the penetration rate are from the Physics group (G). Of these, two of the highest, G16H (Healthcare informatics) and G06V (Image or video recognition or understanding) do not enter until 2018 and 2022 respectively (these need further investigation and are not shown in the figure).
G06N (Computing arrangements based on specific computational models), G10L (Speech analysis techniques), G06T (Image data processing or generation) and G06Q (Information and communication technology specially adapted for administrative, commercial, financial, managerial, or supervisory purposes) are perhaps the most strongly linked to the technological development of AI, with G06K (Graphical data reading), G05B (Features of control systems or elements for regulating specific variables, which are clearly more generally applicable) and G05D (Systems for controlling or regulating non-electric variables) also appearing.
It is worth noting that a number of non-G classes also appear in the top 15: B25J (Manipulators; chambers provided with manipulation devices – ranked 10th), B60W (conjoint control of vehicle sub-units of different type or different function; control systems specially adapted for hybrid vehicles; road vehicle drive control systems for purposes not related to the control of a particular sub-unit), A63F (primarily video games), H04L (Transmission of digital information) and H04N (Pictorial communication, e.g. television). These show that there are indications of potentially important activity outside of the Physics group.
3.4. Relative importance of AI skills across patent classes
3.4.1. Differences within groups
This section illustrates how, even within the same broad patent class (e.g. A, B, …) the relative importance of AI knowledge or skills can differ in important ways. To show this, the discussion draws on the combinations of skills (keywords) and technologies that form the focus of AI activities in each of the following groups:
- Group A (Human necessities)
- Group B (Performing operations; transporting)
- Group C (Chemistry; metallurgy)
- Group G (Physics)
- Group H (Electricity)
Figure 14: Examples of changes by size class, 2014-2023
Source: USPTO. Own calculations
Groups D (Textiles and flexible materials), E (Fixed constructions) and F (Mechanical engineering; lighting; heating; weapons; blasting) are omitted from the analysis. While they have discernible levels of AI activity, they do not seem to have formed core areas as advanced as the other five just mentioned and are not covered here.
To demonstrate the differences in a straightforward way, the AI skills are collated for each of the main areas of technology (patent classes at the three-digit level). However, only the main AI three-digit classes are used in the analysis. Two related dimensions are shown:
1) The relative sizes of each AI skill within each of the three-digit classes are then calculated. The proportion of each knowledge set within each technology area (patent class). In other words, of the AI patents found in technology class A61B 2.5% are associated with NLP, 2.0% are related to MachVis and so on (see Figure 15a). These results are shown on the left-hand side of each set of figures (labelled a in each case) and the results in each “column” sum to 100%.
2) Then, part (b) of each figure examines the distribution of the use of each are of knowledge across technologies. In other words, it shows the proportion of NPL that appears in A61B compared with the proportion in A63F in Figure 15b (the proportions across technologies sums to 100%). This shows the differences in the relative importance of the different knowledge sets for the given areas of technology within the Group.
The question being posed here is, at what point does the teaching of any specific knowledge base become sufficiently less relevant that it becomes unnecessary for individually in a particular technology area for it not to be part of their formal education. At this point, we do not have knowledge of what that cut-off might be.
Group A (Human necessities)
The core of AI activity within this group is focused around A63F (Card, board, or roulette games; indoor games using small moving playing bodies; video games; games not otherwise provided for) and A61B (Diagnosis; surgery; identification). Even though they both reside within group A, they comprise quite different areas of activity. This is reflected in the results of the comparison, which are shown in Figure 15.
Figure 15a compares the relative importance of each knowledge area within the two selected patent classes (e.g. the sum down the A61B measures across knowledge areas is 100 and the corresponding sum down A63F is also 100). So, the reader can see that DpLearn has the largest proportion of the “column” total of all knowledge areas (18% of the total for A61B and 16.6% for A63F). On the other hand, while Robo forms 9.4% of the knowledge base for A61B, this compares with only 1.6% for A63F.
Now, ignoring the figures given in the data labels, it is possible to look at the relative importance of each of the knowledge sets within the two patent classes (technology areas). This is show by the relative sizes of the left- and right-hand columns for each row, separately for each column. In Figure 15a, this is represented by the relative size of each row between the right- and left-hand column sizes. Figure 15b is used in this first example to show what the two column sizes in Figure 16a represent (e.g. NPL in Figure 15b is constructed as the proportion (2.5/(2.5+5.5))*100% for A61B and (5.5/(2.5+5.5))*100% for A63F).
From Figure 15b, therefore, the widths of each corresponding left- and right-hand cell shown in Figure 15a represent the relative importance of the particular knowledge base chosen, having adjusted for differences in the relative sizes of the two patent classes. There are two special cases in which the provision of education and training would be simplified:
1) if the two columns met at 50% on the horizontal axis for every row it would suggest that the knowledge areas were equally relevant across all areas of knowledge and across both areas of technology;
2) if the two columns met at the same percentage below 50% (say 40%) on the horizontal axis for some sub-set of knowledge bases, and at the corresponding percentage over 50% (e.g. 60%) for the remaining sub-set of technologies, it would suggest a fixed relationship between these two sub-sets of knowledge.
Again, the discussion has returned to the importance of the relationships between knowledge bases in the design of education and training courses.
Examining either Figure 15a or 15b, several of the AI knowledge bases do not differ greatly from the 50% line (e.g. GAN, DpLearn and MachVis), while several of the others are quite clearly very different. In particular, Robotics (Robo) is around five or six times as important in A61B (probably linked to its Surgical sub-component), while Reinforcement learning (ReinfLrn) and Natural language processing (NLP) are considerably more important in A63F (probably linked to the Video games sub-component). If a 60/40% breakdown was set in Figure 15b (e.g. >60 is essential to that technology and <40% not essential), there would be five areas of knowledge that might be considered essential for one technology, but not the other (e.g. NLP might be considered essential for A63F, but non-essential for A61B). However, other things being equal, Robotics would be a prime case falling into this train / do not train categorisation. Of course, this does not mean Robotics should not be taught, only that it might not be very relevant for A63F.
Figure 15: Differences in the relative importance of key AI skills within Group A, 2023
Source: USPTO. Own calculations. Note: ML and NN have been omitted (this makes a difference to the numbers shown in each column, but not the overall pattern of results)
Group B (Performing operations; transporting)
Group B (see Figure 16) is another emerging area within AI and, at the present time, as in the case of A, only two three-digit patent classes have been identified at its core: B25J (Manipulators; chambers provided with manipulation devices) and B60W (conjoint control of vehicle sub-units of different type or different function; control systems specially adapted for hybrid vehicles; road vehicle drive control systems for purposes not related to the control of a particular sub-unit) (see Section 3.1).
Figure 16a is constructed in the same way as Figure 15a and, hence, the sums down each half of the figure each equal 100%). In this case, Robotics forms by far the largest proportion of the total across knowledge areas for class B25J (29.3%), while in the case of B60W CNN is the largest at 22.2%. The smallest proportion of the total across knowledge areas for B25J is RNN, compared with Robo – the smallest for B60W at only 3.2%.
Figure 16b (also a replica of 15b) shows that the closest to equality of use in the figure (e.g. those that meet at about the 50% mark on the horizontal axis) are Computer vision (CompVis), reinforced learning (ReinfLrn) and Supervised learning (SuperLrn). However, Recurrent neural networks (RNN) and Robotics (Robo) stand out as being quite different between the two patent classes. For example, the value of Robotics in B25J is around nine times that of B60W. Four of the eight knowledge sets fall outside the (arbitrarily chosen) 60/40 values.
Figure 16: Differences in the relative importance of key AI skills within Group B, 2023
Source: USPTO. Own calculations. Note: ML and NN have been omitted (this makes a difference to the numbers shown in each column, but not the overall pattern of results)
Group C (Chemistry; metallurgy)
In this group, three classes have been included as the core: C07K (Peptides), C12N (Microorganisms or enzymes; compositions thereof; propagating, preserving, or maintaining microorganisms; mutation or genetic engineering; culture media) and C12Q (Measuring or testing processes involving enzymes, nucleic acids or microorganisms; etc.) (see Figure 17).
Before examining the results, excluding ML and NN, it is important to understand how dominant Neural networks (NN), Convolutional neural networks (CNN) and Machine learning (ML) are within the three sectors that have been chosen to represent the core technologies in Group C. CNN is important across all technologies, though by no means as ML and NN (see Table 1). While reference to CNN is made in 24.6% of all AI patents, this compares with 29.5% across Group C, but 35.3% in C12N and 43.5% in C07K. Note that the percentages differ slightly with those about to be discussed (in Figure 17), as the latter are only constructed across the three technologies shown in the figure.
The main difference lies in the role of CNN within Chemistry, as shown in Figure 17(a). Again, the columns show the percentage distribution across knowledge sets for each patent class (area of technology) – e.g. the relative importance of each knowledge set within a given technology class. CNN is the most important in C07K, ML in C12Q and, again, CNN in C12N. The smallest proportions were associated with ML for C07K, CNN for C12Q and NN for C12N. Thus, CNN, which largely dominates C07K (58.5%) and C12N (64.3%) is the smallest by far in C12Q (27.8%).
Having controlled for the different absolute sizes of patenting activity in Figure 17(a), the resulting data are translated into row proportions in Figure 17(b) – e.g. the relative extent to which each knowledge set is used across different technology classes. With three classes included the “lines of equality” in distribution between the AI skills used lie at 33.3% and 66.6%. The second most obvious feature is that ML is relatively more important in C12Q than the other two knowledge sets, forming 51.7% of the total, with both C07K and C12N at roughly half that value. The lowest relative use of CNN is found in C12Q, forming only 18.5% of the total, compared with C12N, where it forms 42.7%. NN shows a pattern akin to ML. Hence, CNN appears the odd one out here, with a much lower importance of CNN in class C12Q (at 18.5%) compared with 38.8% in C07K and 42.7% in C12N.
Given that, when combined, ML, NN and CNN form 74.4% of C07K, 75.8% of C12N and 64.4% of C12Q (70.0% of the total for the three areas combined), then there is little or no room for major roles amongst any of the other knowledge sets. Figure 18 provides the percentage breakdown for the other contributors – only the main nine of the remaining 38 knowledge sets are presented. Again, each “column” in Figure 18(a) sums to 100 and the same with each row in Figure 18(b). One interesting feature is that Robotics (Robo) features quite strongly in C07K and C12N, though less so in C12Q. Although it is not currently a major feature of activity in Group C, it is quite important in several of the classes. In addition, what Figure 18 shows is the diversity of outcome across these three patent classes – this diversity is only increased when the other non-core areas of technology are added to these core results.
Figure 17: Differences in the relative importance of key AI skills within Group C, 2023
Source: USPTO. Own calculations
Figure 18: Differences in the relative importance of residual AI skills within Group C, 2023
Group G (Physics)
The central core of AI activity at the present time, is largely locked away in group G[footnote 20] (see Figure 19). Here, 10 skills have been selected as the core – the most important grouping of AI activity across all technologies. Interpretation of Figure 19a remains as before, with values down each “column” in the figure reflecting the relative importance of each knowledge area within the total for G as a whole (where the total down each “column” is always 100%). It can be seen that CNN is the largest of the knowledge sets for technologies G06T and G06V (and equal largest with DpLearn for G06N).
In Figure 19(b), equality in the relative importance of any specific knowledge base across the technologies would be shown by an equality of the six proportions and that would occur every 16.7% percentage points along the horizontal axis). Therefore, the results demonstrate that 26.8% of the NPL activity in this Group is associated with G06Q, but only 4.4% with G06T. On the other hand, 29.0% of GAN patents are linked to G06T and only 8.8% with G06F. Again, there are considerable differences in the relative importance of each knowledge set within Group G.
Figure 19: Differences in the relative importance of key AI skills within Group G, 2023
Source: USPTO. Own calculations. Note: ML and NN have been omitted (this makes a difference to the numbers shown in each column, but not the overall pattern of results)
Group H (Electricity)
There are three classes in the core area of AI activity for Group H, involving nine AI skills (see Figure 20). On balance, there is a somewhat greater degree of equality within group H. Neural networks (NN) and Recurrent neural networks (RNN) are very similar across the three classes H04L (Transmission of digital information), H04N (Pictorial communication, e.g. television) and H04W (Wireless communication networks), with high to low ratios of 1.1 for both NN and RNN.
There is also a close similarity in the relative importance for Deep learning (DpLearn) and Machine learning (ML), with ratios of 1.2 and 1.4 respectively. However, the relative importance of Image recognition (ImgRec) and Computer vision (CompVis) is the most different with maximum proportions of 65.8 and 59.0, compared within minima of 14.3 and 19.2, respectively. This is perhaps unsurprising given H04N relates to pictorial communication, while the other classes relate to wireless and more general systems of communication. The importance of Natural language processing (NLP) within H04W and of various forms of learning within H04W can also be seen.
Figure 20: Differences in the relative importance of key AI skills within Group H, 2023
Source: USPTO. Own calculations. Note: ML and NN have been omitted (this makes a difference to the numbers shown in each column, but not the overall pattern of results)
3.4.2. Skills differences across Groups
The five groups can be compiled in a single analysis to show how there are differences both across the groups themselves, rather than within the groups (see Section 3.4.1). The groups themselves represent broad, but, nevertheless, distinct areas of technology (e.g. Human necessities, group A, is quite different from what is essentially Telecommunications, group H). Insofar as different groups have different AI skills mixes and different rates of growth, this will affect the overall demands for AI skills.
The following figures in this subsection shows the group level differences in AI knowledge sets. The results are constructed in the same way as the in the previous five figures (here only part (a) is presented), but, in the present case the individual classes selected as core to the earlier analysis are subsumed within the group as a whole. While this tends to average out some of the differences within each group, nevertheless a considerable range of differences can still be seen.
The first dimension of note concerns Machine learning (ML) and Neural networks (NN) – see Figure 21. The discussion has promoted this result first because, in terms of patent numbers, they are so prominent that, when included. However, they were omitted from the figures in Section 3.4.1 for two main reasons: (i) on practical grounds since, it proved difficult to illustrate the relative roles of the other knowledge areas because they tended to overshadow the role of the other knowledge bases; (ii) even more importantly, it is not at all certain whether ML and NN refer to two separate skill sets or one, or a mix of these two possibilities.
If, for the sake of argument, they are treated as two separate skill sets, then Figure 20 indicates that jointly they account for between 38.7% (Group C) and 46.7% (Group A) – this compares with the cross-patent class values from 27.9% (C12N) to 50.2% (H04L). Even, within the Chemistry Group, the difference was considerable, again from 27.9% for (C12N) to 48.9 for (C12Q). This is a common feature for all the results that are reported below – the broader Group level findings tend to conceal considerable differences to be found at the more disaggregated level.
Figure 21: Aggregating to Group level (across core classes), ML and NN
Source: USPTO. Own calculations. Note: the Other column shows all non-core areas of technology, not covered by the A-H columns.
While Deep learning (DpLearn) and Convolutional neural networks (CNN) are considerably smaller than ML and NN, they are the next largest knowledge areas and present across all patent Groups. Figure 22 shows what happens when they are added to ML and NN in Figure 21. Group C, which was notably different in the previous figure, now shows even more marked variations on the other four core groupings. CNN now stands out as very distinct, as it comprises a larger proportion of the total than ML or NN separately, and not far off the ML plus NN total. While DpLearn is present across the Groups, its proportion for C is approximately half the value of the other Group’s values. All the remaining residual knowledge sets for each Group now form between about 20 to 40% of each total, with Group C the main outlier with just under half the value found in the other Groups.
Finally, Figure 23 shows the breakdown of the residual (Other) category[footnote 21] from Figure 22. The small sizes of some of these knowledge sets does not imply that they should be ignored, because they may be emerging at the present time and may increase significantly in importance in the future. For the same reason, not too much emphasis should be place on their differences in relative size as this too may be changing. One thing stands out at the Group level throughout the discussion that C (Chemistry; metallurgy) stands out as quite distinct to the other Groups, at least at this stage of development. One final point is that, at this level of aggregation, the importance of Robotics is largely “washed-out”, but it is a crucial knowledge set for some (three-digit or more) technologies.
Figure 22: Aggregating to Group level (across core classes), CNN and DpLearn
Source: USPTO. Own calculations. Note: the Other column shows all non-core areas of technology, not covered by the A-H columns.
Figure 23: Aggregating to Group level (across core classes), residual knowledge sets
Source: USPTO. Own calculations. Note: the other knowledge set shown in the top row of cells is the sum of DataMin, GAN and MachVis, which were the smallest individually; ML, NN, CNN and DpLearn have now been omitted, but the proportions shown here still sum to 100 including those knowledge sets; the Other column shows all non-core areas of technology, not covered by the A-H columns.
3.5. Some final comments
The picture emerging from the results of this section indicates that a small number of knowledge sets (such as Neural networks, Convolutional neural networks and Deep learning) are commonly used across a wide range of technology areas. When the analysis moves to a more aggregate level (Group or 1-digit), some of the very significant differences at the more disaggregated level (3-digit) are “washed-out”. This dilution in the apparent relative importance of some areas is exacerbated by the separate treatment of Machine learning and Neural networks, which seem to overlap significantly.
However, even at the more aggregate level important differences still exist. The most extreme of these is the importance of CNN within the Chemistry; metallurgy Group, although the major reason for this is its importance within C12N.[footnote 22] A further example that appears particularly important is Robotics. While this varies considerably at the most aggregate (Group) level, a considerable amount of the activity associated with it can be traced to the detailed classes A16B[footnote 23] and B25J[footnote 24]. This tendency for more aggregate treatment of the technology areas to wash-out some of the differences at the more detailed level should be born in mind by policy makers. This is because economies such as the UK are unlikely to prioritise all areas of technology and to concentrate on areas of potential strength or future strategic interest.
The final point is that, while AI technologies, represented by patent classes, have different knowledge needs. These needs are rarely represented by a single keyword, but rather more often by several keywords, and sometimes a considerable number. These can be thought of as knowledge “packages” that are, to different degrees, a prerequisite component of the skills for the AI invention, innovation and diffusion processes. It has been shown how these packages can be identified, but, more importantly, the first steps have been taken to demonstrate their emergence, their changing composition and their relative importance over time.
4. Overall conclusions
4.1. Key findings
While exploratory in nature, the work on large-scale, machine-readable patent data has opened a new, rich source of information, linking technologies with knowledge sets. The present study has analysed this data with respect to the knowledge and skills base required in the era of AI, although it is equally applicable to other areas such as environmental (green) technologies. The commercial importance of AI technologies has resulted in a rapid growth in their development and their protection by means of patents. Indicators constructed from patent data are, therefore, a bellwether of technologies that potentially will be adopted in the future.
The following are some key points from the analysis above:
-
Patents, AI technologies and the AI-knowledge base. Searching using the 41 keywords enabled the research to link AI-related knowledge and skills with AI technologies. Over half of the keywords – the more generic ones that might be used by potential employers in job descriptions – can be linked with the vacancy search also conducted as a part of this project. Patents relating to AI inventions were identified in every year of the dataset, with numbers rising from 18,500 in 2014 to 69,600 by 2023, which at the end of the period constituted 20.3% of all patents.
-
Patent classes / technologies. It was possible to allocate each of the patents to one or more patent classes, which allowed a link to be identified between the knowledge bases used (keywords) and detailed technology areas (the patent class). Around 450 patent classes were associated with some aspect of AI over the period 2014-2023. Most patents were allocated to more than one class, where at least one class was clearly AI, but the other was often (but no means always) non-AI in nature. Thus, it was possible to separate wholly AI focused inventions from those where AI was being applied to a different area of technology (e.g. vehicles, surgery, etc.).
-
Technologies and sectors. In just the same way in which some occupations are sector-specific and many others are used in several or many sectors, the same is true of technologies – and AI technologies are no exception. For example, many of the technologies needed for driverless vehicles (e.g. computer vision, image recognition, various forms of learning) are equally important to other areas of AI-application. Nevertheless, the study developed an initial concordance of technologies that has been applied to give sectoral results, but this requires considerably more work to develop and fine-tune it further. These results are used in modifying the Working Futures projections.
-
Concentrations of AI activity. By systematically collecting inventions together that use the same keywords (e.g. the knowledge / skill types, such as Robotics, Neural Networks, etc.); and appear in the same patent classes (e.g. the technologies, e.g. Road vehicle drive control systems) in terms of the highest patent counts, it is possible to identify meaningful concentrations of AI activity. In terms of the whole matrix of technologies by knowledge areas, these concentrations are associated with five areas. Three of these are small groupings primarily around emerging areas of AI technology:
1) (a) surgery and medical applications and (b) video games;
2) (a) various forms of manipulators and (b) vehicles,
3) measuring or testing processes in chemistry.
A further two – the most developed concentrations – are:
4) the core grouping around Computing arrangements based on specific computational models, involving upwards of six technologies;
5) a large, but less dominant group of three patent classes in the Telecommunications area.
The 16 key patent classes that comprise these five concentrations have formed a focus of the analysis in this report. All these concentrations are developing, and many others are beginning to emerge.
- Fundamental role of “knowledge packages”. A key finding of the present study has been that it is both restricting and, to some degree, misleading to treat areas of AI knowledge / skills as independent from one another. In essence, different areas of technology (e.g. the five concentrations just reported) require different “packages” of knowledge.
1) By analysing the different concentrations of patents (representing five separate core groups of technologies) it has been possible to show that they are characterised by different, distinctive knowledge / skill bases.
2) At a detailed level of technology, the five areas differ both in which knowledge types are included (e.g. some are not relevant) and the relative importance of different types of knowledge often varies significantly even when the same ones are included.
3) When areas of core technology are aggregated, even within the same technological group, some of the differences in terms of the relative importance of knowledge sets required is “washed-out”. This has important policy implications because economies, such as the UK are unlikely to prioritise all areas of technology and to concentrate on areas of potential strength or future strategic interest.
-
Applications of the research. As a complement to other information sources, patenting activity appears an extremely useful for identifying current trends in technological activity. Even at this early stage in the development of this tool, it appears to have potentially important applications in informing education and training institutions of emerging and developing knowledge bases that might be incorporated within further and higher educational programmes.
At this point in time, there appears to be no centralised, comprehensive but accessible online information about what the various AI technologies being developed and the knowledge sets associated with them. Such a source could inform decision makers in business, academic institutions and students deciding on programmes of study, that would make future labour demand and supply a better fit with private and social aspirations.
Annex 1 Detailed findings on concentrations of AI activity
While the group levels (the first digit: A, B, …, H) are a useful organisational device, the interpretation of them should be treated with a degree of caution as there are a number of patent classes, particularly within group G (the primary AI group) where the subject matter of the patents relate directly to other areas outside of G. For example, G16H (Healthcare informatics) in group G has links to A61B (Diagnosis; surgery; identification) in group A. Robotics is a more complex example, as it has no associated individually identified class, but is assigned to different classes, mainly based upon the area to which the robotics is to be applied.
In addition to the notes above, it is also important to explain how the figures are organised as they contain a considerable amount of information. First, they are separated by the broad (most aggregate) patent groups (A-H – see Figure 21: 14a-16h). In each group, the patents are organised by the magnitude of their penetration rate in 2023 (AI-patent count as a percentage of total patent count). Normally the 10 with the highest penetration rates in 2023 are selected for reporting, although this is sometimes extended in the discussion. Those reported in the figures (e.g. the top 10) are then ranked before being reorganised from highest to lowest change in percentage point penetration rates between 2014 and 2023.[footnote 25] The percentage point changes are shown on the secondary axes.
Group A (Human necessities)
Within group A, A63F (Card, board, or roulette games; indoor games using small moving playing bodies; video games; games not otherwise provided for) and A61B (Diagnosis; surgery; identification) are the classes with by far the largest patent counts (11,600 and 11,100 respectively, over the period 2014-2023 as a whole). However, A63F has a slightly higher penetration rate than A61B in both years, caused probably by the presence of video games in the former. As shown elsewhere, however, A61B (with its focus on various aspects of medicine and surgery) is especially important for its social and welfare implications. The overall average penetration rate across A is fairly low (12.4%, in 2022 – slightly lower in 2023) despite the effects of A63F and A61B. This small overall value is caused by the is a further 38 classes in the group which record penetration rates below 12%, of which 18 are below 5%.
Group B (Performing operations; transporting)
Within group B, three classes have almost identical penetration rates at just under 45% in 2023, the highest by far for this group. These are B25J (Manipulators; chambers provided with manipulation devices), B61L (Guiding railway traffic; ensuring the safety of railway traffic) and B60W (Conjoint control of vehicle sub-units of different type or different function; control systems specially adapted for hybrid vehicles; road vehicle drive control systems for purposes not related to the control of a particular sub-unit).
The percentage point change for B61L was almost identical to the penetration rate for 2023, as the class started from almost zero. B60W is of particular interest because of its links with the control of road vehicles and B61L in the case of railway traffic. The other seven classes have much more moderate rates, at or below 20%. There are a further 50 classes covered in this analysis, of which 21 are below 10%, but above 5%, and a further 29 that are below 5%. The overall average for this group is 11.0%, slightly lower than group A above (caused by the long tail of low AI patenting three-digit classes).
Group C (Chemistry; metallurgy)
Within group C, two classes are amongst the driving forces in this broad class, relating both to separate areas of microorganisms and genetics, where considerable research is occurring. These are C12N (Microorganisms or enzymes; compositions thereof; propagating, preserving, or maintaining microorganisms; mutation or genetic engineering; culture media) and C12Q (Measuring or testing processes involving enzymes, nucleic acids or microorganisms; compositions or test papers therefor; processes of preparing such compositions; condition-responsive control in microbiological or enzymological processes).
C12M (Apparatus for enzymology or microbiology) is a relatively small group, with just over 500 patents in 2023, but has the highest percentage point change. However, its penetration rate by 2023 was under 15%, compared with C12Q, which is 23.9%. The relatively high rates of AI activity in the first three areas are in stark contrast to the rest of the group. The overall average for group C is considerably lower even than B, at 7.2%, with 26 of the total 35 classes at less than 5% penetration in 2023.
Group D (Textiles or flexible materials)
There is very little AI activity within group D and, hence, no figure has been provided for this group. This low level of activity seems quite surprising given textiles’ very early links to “computer-type” card input of designs and subsequent developments through to CNC technologies. This may be because the applications of AI lie more with the suppliers of equipment to this area rather in the textiles themselves. The only class worth reporting is D06F ((Laundering, drying, ironing, pressing or folding textile articles) with a penetration rate of 0.3% in 2014, rising to 10.8% in 2023. The percentage point change for this class is just under its 2023 value, in other words, penetration by 2023, while low, stemmed from a base of almost zero.
Group E (Fixed constructions)
Group E is again very small both in terms of total patent counts and AI-patents, hence only 16 three-digit classes meet the various criteria for inclusion (e.g. data continuity). Of these, E21B (Earth or rock drilling; obtaining oil, gas, water, soluble or meltable materials or a slurry of minerals from wells) stands out within the group, as being by far the largest patent class (total patents 27,900 over the period as a whole) and five-times the size of the second class, E05B (Locks; accessories therefor; handcuffs), which has a completely different subject matter, reflecting the diversity of the E-group. Its penetration rate is slightly under 13% by 2023, but, in relative terms, A05B falls by one place in terms of its 2023 ranking.
The two highest penetration rates by 2023 are E02F (Dredging; soil-shifting) and E21B, which also have the highest percentage point changes in penetration rates (10.9 and 10.5 percentage points). All but four of the top 15 ranked penetration classes in 2023 have penetration rates of below 5 percentage points. The overall AI-proportion for the part of the group analysed is only 6.7% (e.g. excluding classes with missing observations).
Group F (Mechanical engineering; lighting; heating; weapons; blasting)
Group F is a diverse collection of technological activities. It should be remembered that defence is a special area under patent law, where inventions impinging on national security are excluded. F24F (Air-conditioning; air-humidification; ventilation; use of air currents for screening) and F02D (Controlling combustion engines) have the highest penetration by 2023 and the highest percentage point change in penetration over the period (see ranks 1 and 2).
However, by the standards of even groups A-C the highest penetration rates were low. In the case of F41G (Weapon sights; aiming), F03D (Wind motors) and F03B (Machines or engines for liquids), the penetration rates by 2023 are all marginally under 10%, and the first of these had the lowest change in penetration rate of all ten classes shown. Of the 41 classes for which the results met the data availability criteria, 31 of them had penetration rates of under 5% and, of these 14 had rates under 2%, with the overall average across the 41 just 4.1%.
Group G (Physics)
A set of classes within group G (Physics) can be considered as the core powerhouse of AI technologies. The extent of AI-patenting in this group is of a different order of magnitude to anywhere else. Patent counts for G06F (Electric digital data processing), G06Q (ICT specially adapted for administrative, commercial, financial, managerial or supervisory purposes; systems or methods specially adapted for administrative, commercial, financial, managerial purposes, not otherwise provided for) and G06T (Image data processing or generation, in general) - all part of the core area of AI-activity - were very large.[footnote 26]
However, G06N (Computing arrangements based on specific computational models) has the highest penetration rate in 2023, of 92.7%, which is unsurprising as it is almost entirely dedicated to AI and a part of what can be considered as the core AI-inventive classes.
There are four classes that feature in the top 10 penetration rates but do not appear in the figure as they do not begin in 2014 and it is not possible to construct data for the full period. These are G06V (Image or video recognition or understanding), G16B (bioinformatics, i.e. ICT specially adapted for genetic or protein-related data processing in computational molecular biology), G16C (Computational chemistry; chemoinformatics; computational materials science) and G16H (Healthcare informatics, i.e. ICT specially adapted for the handling or processing of medical or healthcare data). Two of them relate to inventions designed for user groups associated with C and A (e.g. G16C for Chemistry and G16H for Health Care) and may be part of the reason for the lower levels of AI-activity within those areas themselves. All of the remaining classes shown in the figure have penetration rates above 40% by 2023, although the percentage point growth rates are lower than several of the other broad classes.
Group H (Electricity)
This is the second most active patent group in terms of AI. H01L (Semiconductor devices not covered by class H10), H04W (Wireless communication networks) and H04L (Transmission of digital information) report total patent numbers of 208,000, 123,000 and 146,000 over the whole period, of which their AI patents constituted 5,300, 15,500 and 30,000.
Given H01L’s relatively low AI activity, the other two (H04W and H04L) report considerably higher absolute values over the period and H04N also appears into the rankings with 18,400 AI patents. The percentage point changes in the penetration rates quite closely follow the ordering of the penetration rates in 2023 (see the ranks).
The top four by both penetration rate and penetration rate change are H04M (Telephonic communication), H04L (Transmission of digital information), H04S (Stereophonic systems) and H04N (Pictorial communication, e.g. television). All four of these have penetration rates above 30%, with all showing a percentage point growth of over 20%. At least three of these four are more linked to telecommunications than Electricity per se.
Figure 24: AI penetration by patent class within technology groups (highest penetration rates, ranked by percentage point growth, 2014-2023)
Source: USPTO. Own calculations.
-
Full details of the application (the description of the invention, etc.) can take up to 18 months to publish, but then the invention details are released weekly by the USPTO. ↩
-
Bosworth, et al. (2002). “The Role of Innovation and Quality Change in Japanese Economic Growth”. In J.S. Metcalf and U. Cantner (eds.) Proceedings of the Schumpeter Conference, Manchester, 2002, pp. 291-318. ↩
-
E.g. Maeno, T. et al. (2021).Leading Indicators for Detecting Change of Technology Trends: Comparison of Patents, Papers and Newspaper Articles in Japan and US. International Journal of Innovation and Technology Management, Vol. 18, No. 04, 2150017. Jiang, L., F. Zou, Y. Qiao, Y. Huang (2022). “Patent analysis for generating the technology landscape and competition situation of renewable energy”. Journal of Cleaner Production. Vol. 378, 134264. ↩
-
See, for example, Considering Legal Implications of Open Sourcing Patents for Sustainable Vehicles, but also the case of the Tesla open patent scheme, Why Tesla’s Open Source Patent Strategy Reinforces the Importance of Patenting ↩
-
E.g. “Google has a total of 111911 patents globally. These patents belong to 38450 unique patent families. Out of 111911 patents, 78883 patents are active.” See: Google Patents – Insights and Stats (Updated 2024) ↩
-
This is a harmonised international patent classification, currently covering European countries and the USA, with other countries set to join. ↩
-
Three further keywords/phrases were tried: “algorithmic intelligence”, “complex adaptive networks” and “synthetic intelligence”. All three only had a couple of “hits” in the initial, exploratory searches and so were dropped from the list. ↩
-
In principle, data for 2013 are available. However, it is a transition year between the pre-CPC and CPC periods. This transition appears to have disrupted the viability of the data for 2013. Examination of the emergence of AI prior to 2014 is possible using the earlier USPTO classification, but this work lies outside of the scope of the present study. ↩
-
2013 was a transition year between the pre-CPC and CPC periods. This transition appears to have disrupted the viability of the data for 2013. The present study therefore does not examine the emergence of AI prior to 2014 using the earlier USPTO classification. ↩
-
One further piece of evidence that supports this hypothesis is that, while anecdotal discussion of such terms generally suggests that ML is the dominant area, but patent classification information suggests otherwise. ML as a patent class is associated principally with the five-digit code G06N 20/00 and other references to ML refer to it as a subset of NN (e.g. “Where the machine learning relates to learning methods within neural networks, classification should be made in group G06N 3/08 only, where the latter class relates to “Learning methods”). ↩
-
Perspective Economics report: Artificial Intelligence Sector Study ↩
-
ALGO may have other issues in terms of technology, for example, algorithms may be essential to the invention, but not themselves the focus (the novel feature) of the invention and therefore not referred to in the patent application. Also, while ALGO may imply certain software related skills, the patent may not specify which skills they are – this is something worth checking on in the future. ↩
-
Note the assignment of patents to classes occurs at the five-digit level, for example, G06N 3/002, where the specific technology is 3/002, Biomolecular computers, i.e. using biomolecules, proteins, cells (lodged within 3/ which is Computing arrangements based on biological models). Patent technologies are allocated by a team of experts at the five-digit level. However, this level of detail is not used in the present report. ↩
-
If the patent entries over time are erratic (in particular, including a number of zeros and positive numbers with little pattern), they are dropped. However, a number of the classes have entries that do not begin until after 2014 but show stable and interesting patterns. These have been dealt with separately, if only by discussion within the text. Late entry is generally the result of the fact that this was the first time at which inventions appeared in that class. ↩
-
As this work was carried out, it became apparent that it is also possible to analyse similar activities involving more than one Group (e.g. between, say G and A), bringing two distinct technologies together (e.g. robots and AI or surgery and AI). This will be the subject of future research. ↩
-
Historically, some areas fell outside of what was patentable. As a consequence, when changes were introduced to patent laws to allow AI to be classed as a “technology” and therefore patentable, this led to high penetration rates. While such rates are not incorrect, comparisons between areas are organised by size to allow for differences in prior patentability and the presentation of the results makes allowance for this. ↩
-
Selecting is also in the top ten of this group (e.g. methods, circuits, or apparatus for stablishing selectively a connect ion between a desired number of stations). ↩
-
G16B (Bioinformatics ICT); G08G (Traffic control systems: railways, roads, etc.); G09B (Educational or demonstration appliances); G08B (Signalling or calling systems; order telegraphs; alarm systems); G07C (Registering, indicating or recording the time of events or elapsed time); G01V (Geophysics; gravitational measurements; detecting masses or objects); G07F (Coin-freed or like apparatus). ↩
-
Further information can be found by moving to the five-digit level. However, a list based on the most detailed classification can be found here. This can be used to investigate any part of the patent classification. ↩
-
G06F (Electric digital data processing): G06N (Computing arrangements based on specific computational models); G06Q (ICT specially adapted for administrative, commercial, financial, managerial or supervisory purposes; systems or methods for that purpose) and G06T (Image data processing or generation, in general). G06V (Image or video recognition or understanding); G16H (Healthcare informatics, i.e. ICT specially adapted for the handling or processing of medical or healthcare data). ↩
-
The remaining Other category in Figure 21 is the sum of the four smallest knowledge sets, DataMin, GAN, MachVis and NLP. Of these NLP has relatively larger values in Groups G and H. ↩
-
Microorganisms or enzymes; compositions thereof; propagating, preserving, or maintaining microorganisms; mutation or genetic engineering; culture media. ↩
-
Diagnosis; surgery; identification ↩
-
Manipulators; chambers provided with manipulation devices ↩
-
Hence the rankings in each table are not sequential (e.g. 1, 2, 3 …), although there is a tendency for those with higher penetration rates in 2023 to have experienced larger percentage point changes. ↩
-
With values of 11,000, 4,900 and 4,700 respectively for AI patents in 2023, and totals over the period 2014-2023 of 75,100, 32,000 and 26,300. ↩