Research and analysis

Summary of methodology

Published 27 November 2025

The UK Standard Skills Classification (SSC) was constructed in 3 distinct stages as set out in the Phase One report for this project ‘A Skills Classification for the UK: Plans for development and maintenance’. In each stage Artificial Intelligence (AI) tools, in particular text embedding vector comparison and Large Language Model (LLM) evaluations were used, to validate, deduplicate and standardise multiple input datasets, with additional manual reviews to ensure accuracy, alignment and reliability of outputs.

The 3 main development stages were the creation of:

  • an Occupational Task library for the UK Standard Occupational Classification (SOC) 2020 Sub Unit Groups (SUGs) (6-digit occupations)
  • a hierarchical classification of Occupational Skills, consisting of 4 levels, with the most detailed level also linked to a set of Core Skills
  • a library of Knowledge concepts

Mappings were then created between these 3 elements and to occupations, courses, qualifications and other existing classifications.

This section outlines the process followed during each of these 3 main stages and discusses the unexpected issues that led to deviations from the original Phase One plan. A more detailed description of the methodology can be found in Appendix B.

Occupational Tasks

Figure 5 outlines the development process of the SSC Occupational Tasks, detailing the input libraries used, the data cleaning steps, and the validation against other information sources.

Figure 5: Development of SSC Occupational Tasks

Figure 5: Development of SSC Occupational Tasks

Figure 5 outlines the development process of the SSC Occupational Task library, detailing the input libraries used, the data cleaning steps, and the validation against other information sources.

This is displayed as a series of processing steps from T1 to T6 in a row across the top of the diagram with each step shown below in a flow diagram. On the left hand side are the 4 input libraries which feed into the first processing step: T1.

The input libraries from top to bottom are:

  • the Association of Graduate Careers Advisory Services (AGCAS) responsibilities
  • the Institute for Apprenticeships and Technical Education (IfATE) duties
  • the US Occupational Information Network (O*NET) tasks
  • National Careers Service (NCS) day-to-day tasks

The processing steps show:

  • T1 ‘validate as Task Statements’
  • T2 ‘cluster by SOC Ext’
  • T3 ‘use AI to sub-cluster by meaning’
  • T4 ‘use AI to merge and duplicate’ which has arrows pointing to 2 steps under T5
  • T5 ‘validate via SOC Ext description’ and ‘validate against job ads’ which both have arrows to step T6
  • T6 ‘Occupational Tasks’

Task statements were collected from the input libraries, then cleaned and standardised using AI tools to ensure clarity, consistency, and UK English usage.

The tasks were refined, deduplicated, and clustered using text embeddings from OpenAI models and hierarchical clustering methods. The resulting clusters of tasks were manually inspected to help merge overlapping clusters. The validation stage involved comparing these tasks to tasks extracted from SOC SUG descriptions and a database of around 8 million job vacancies. This ensured tasks were accurately mapped and relevant to real-world job roles.

Finally, the SSC Occupational Task library was completed, comprising 21,963 tasks linked to SSC Skills, Knowledge concepts, and Occupations.

The vacancy data used in this analysis is drawn from the Institute for Employment Research (IER) dataset, funded by the Department for Education (DfE) and covering the period from 2019 to 2024, with updates made in 2025 for the IER.

Occupational Skills and Core Skills

Figure 6: Development of SSC Occupational Skills

Figure 6: Development of SSC Occupational Skills

Figure 6 shows the equivalent process for the construction of the hierarchical classification of SSC Occupational Skills together with a set of 13 Core Skills.

This is displayed as a series of processing steps from S1 to S7 in a row across the top of the diagram with each step shown below in a flow diagram. On the left hand side are the 6 input libraries which feed into the first processing step: S1.

The input libraries from top to bottom are:

  • European Skills, Competences, Qualifications and Occupations (ESCO) Level 4 skills
  • the National Careers Service (NCS) skills
  • O*NET Detailed Work Activities (DWAs)
  • IfATE skills
  • AGCAS skills
  • the Workforce Foresighting Hub, Innovate UK (WFH) skills

The processing steps show:

  • S1 ‘validate as Skills’
  • S2 ‘cluster by meaning’
  • S3 ‘use AI to merge and deduplicate’
  • S4 ‘map against SOC Ex SUGs’ which has arrows pointing to 2 steps under S5
  • S5 ‘validate against Tasks’ and ‘validate against job ads’ which both have arrows to step S6
  • S6 ‘Occupational Skills’
  • S7 ‘Core Skills’

Skill statements were sourced from the input libraries.

They were then standardised (in terms of structure, specificity, capitalisation, spelling and grammar) and quality assured using AI to ensure clarity, consistency, and relevance. These were refined and clustered using OpenAI embeddings and hierarchical models and then AI-generated labels and descriptions were created for each cluster.

These cluster labels were then manually reviewed and, where necessary, modified to use consistent language patterns to create the prototype set of Occupational Skills (the lowest level of the SSC).  

These Occupational Skills were then organised into a hierarchy of Skill Groups, Areas, and Domains, with AI prompts used to validate structure and relatedness. These were then mapped to SOC SUGs and validated against SSC Tasks. Further validation against skills extracted from the job vacancy descriptions was carried out to ensure coverage across occupations and relevance, with additional high-quality skills from the vacancy data added where gaps were found.

A separate set of 13 SSC Core Skills was defined, with AI used to create level definitions and assess proficiency across skills and occupations. The final classification includes 3,343 SSC Occupational Skills, structured into 606 Skill Groups, 106 Skill Areas, and 22 Skill Domains, linked to SSC Core Skills, Tasks, Knowledge concepts, Occupations, and Courses.

Occupational Knowledge concepts

The third main stage in the development of the SSC was the development of a library of Knowledge concepts. This process is outlined in Figure 7.

Figure 7: Development of SSC Knowledge concepts

Figure 7: Development of SSC Knowledge concepts

Figure 7 illustrates the process to develop the SSC library of Knowledge concepts.

This is displayed as a series of processing steps from K1 to K6 in a row across the top of the diagram with each step shown below in a flow diagram. On the left hand side are the 6 input libraries which feed into the first processing step: K1.

The input libraries from top to bottom are:

  • ESCO (European Skills, Competences, Qualifications and Occupations) knowledge concepts
  • Higher Education Coding of Subjects (HECoS)
  • Learn Direct Classification of Subject Codes (LDCSC)
  • O*NET (knowledge, tools used and technology skills)
  • Stack Exchange (topic tags)
  • Wikipedia (article titles)

The processing steps show:

  • K1 ‘validate as Knowledge concepts’
  • K2 ‘cluster by meaning’
  • K3 ‘use AI to merge and deduplicate’ which has arrows pointing to 5 steps under K4
  • K4 ‘validate versus Ofqual’, ‘validate versus IfATE’, ‘validate versus tasks’, ‘validate versus job ads’, and ‘validate versus prototype’ which each have arrows to step K5
  • K5 ‘identify primary concepts’
  • K6 ‘Occupational Knowledge’

The Knowledge concept, subject and topic names were collected from the input libraries.

These were cleaned and filtered using AI tools to retain only concepts evidenced in a UK context. The concepts were then refined and grouped by meaning using embeddings and clustering methods. The validation steps involved mapping these knowledge concepts to external sources including Ofqual, IfATE, SOC SUGs, SSC Tasks, and vacancy data, ensuring relevance and common usage. Embedding matches were also used to link SSC Skills to Knowledge concepts and assess their importance. Primary concept types and related terms were identified, resulting in a final set of 4,926 SSC Knowledge concepts linked to SSC Tasks, Skills, and subject classifications.

Once these 3 main development stages were complete (SSC Occupational Tasks, SSC Occupational Skills, and SSC Knowledge concepts), Al tools were used to create mappings to other classifications and data sources and to create different groupings of the skills, such as Science, Technology and Engineering, Mathematics, Medicine and Health (STEM-M&H), Green skills and Digital skills.

Unexpected Issues

Data Inputs

The Specialist Tasks from the Australian Skills Classification (ASC) were planned as an input for the skills library but the dataset was withdrawn in early 2024 by Jobs and Skills Australia (JSA) as part of a plan to replace the ASC with a National Skills Taxonomy. The JSA cited issues with connectivity to education contexts and given that the content was originally derived from the O*NET Detailed Work Activity Framework - already included as a skill classification input - the decision was taken to exclude ASC Specialist Tasks from the development process.

Data was requested from LinkedIn to supplement other inputs (especially for the Knowledge concept library) but, unfortunately, they were unable to provide access to the level of data required. To improve coverage of newer subjects and Knowledge concepts (such as those related to AI) we included technologies and techniques identified from several other sources such as Innovate UK Workforce Foresighting Hub challenge cycles.

Output Validation

The initial design report included a step to validate the classification against a large CV library but licensing this, or an up-to-date equivalent, proved prohibitively complex and expensive. A pilot with the Department for Work and Pensions (DWP) to evaluate the use of the SSC to generate standardised skills profiles from a sample of service user CVs is, however, underway. Results from this study will be included in the final development report.

Another validation input was a structured list of National Occupational Standards (NOS) titles which was used to evaluate and refine skill coverage. A more in-depth analysis is planned based on access to a more complete NOS dataset, most likely via the new NOS API.

AI tool advice and guidance

The speed and scale of SSC content development and validation would not have been possible without the recent AI-driven advances in natural language processing and generation tools. There are however still limitations and nuances in using these and the following guidance is therefore offered to help those attempting similar work.

Text-embeddings

These are generated by machine learning models that convert words, sentences, or documents into numerical vectors (embeddings) that capture their semantic meaning. These embeddings enable computers to identify, compare and cluster text based on underlying meaning rather than just exact word matches.

Embedding Models

There are several embedding models (both commercial and non-commercial) but the 3 Large model from OpenAI has so far been found to be the most useful and reliable. The fact that this has not been superseded since its launch in January 2024 indicates its relative maturity and accuracy.

Short Text-string Embeddings

Embedding comparison scores (typically cosine similarity) are generally less reliable for short phrases, and especially those that are ambiguous (such as “Interpret communication using NLP” which could be referring to “Natural Language Processing” or “Neuro-linguistic Processing”). Concatenating labels with hyphen separated descriptions can be a cost-effective way of mitigating this issue, provided that the descriptions are accurate and unambiguous.

Data cleaning

Input data cleaning and standardisation matters as statements that are inconsistently capitalised, punctuated and structured will not match as reliably. For example, Table 3 shows the embedding vector cosine similarity scores of 3 phrases against the skill label ‘Provide advice on trademarks’.

Table 3: Embedding similarity scores to phrase ‘Provide advice on trademarks’

ID number Difference Comparison Phrase Embedding Similarity Score
ID 1 Statement reworded Consult with clients about trademark issues 0.78
ID 2 Statement reformatted (Capitalisation, grammar and trailing spaces) consult with clients about trademark issues. 0.65
ID 3 Statement with different meaning Provide advice on trade controls 0.67

Statement ID 1 is reworded but is syntactically correct and consistent and therefore has a fairly high similarity score of 0.78.

In contrast, statement ID 2 is formatted differently and includes character anomalies such as trailing spaces which add noise to the match resulting in a match score of 0.65.

Statement ID 3 has a different meaning (trade controls being a distinct concept to trademarks) has a match score of 0.67. This means that, without data cleaning, the original statement would be incorrectly evaluated as a closer match to statement ID 3 rather than statement ID 2.

Large Language Model Prompts

These are structured data requests directed at Large Language Model (LLM) APIs to process large volumes of data in a consistent way.

Language model selection

Even over just the 18-month duration of this project, the performance improvements in LLMs have been remarkable. Progress has not however been entirely convergent and different LLMs still perform significantly better at some tasks than others. For this classification work, OpenAI models have tended to perform best when compared against others though, and especially at tasks requiring the assignment of match ‘scores’ (such as task to skill importance or relatedness).

Prompt design and validation

Despite increasing context windows (that is the amount of text or data that can be included in an LLM prompt) there is still a balance to be struck with the number and complexity of instructions included. This is because longer lists of rules or criteria seem to increase the chances of at least some being completely ignored. To find this balance, use a randomised sample to validate output coverage but also include specific examples to check known difficulties. Rerunning the prompt against the same dataset can also usefully reveal inconsistencies or ambiguities in both inputs and outputs.

Generative inconsistency and bias

LLMs tend to have a bias to generate US English. This is less likely if you explicitly instruct it to use UK English terms and spelling and hence it is also useful to spell-check using a UK English dictionary.

Statement or Label Categorisation

For content quality or format checks (such as ‘does this statement describe a skill?’) LLMs perform more consistently when asked to apply a pre-defined categorisation framework. For example, Code 3# (Too Generic - This statement is too generic and isn’t describing a specific skill.) or Code 4# (Invalid - This statement does not describe a skill and is instead a tool, subject, attitude or outcome). For a full example of this see the prompt given in section S1 of Appendix B.

Concept tagging

For mappings (such as tasks to skills) LLMs still struggle to categorise or tag statements from long lists of options. A better approach is to compare a text-embedding of an input (such as a task statement) against text-embeddings for all classification concepts or tags (such as skill labels) to generate a longlist of potential matches. Then use an LLM prompt to iterate through and evaluate the potential matches one at a time. For an example of this see the prompt given in section S4 of Appendix B.

In summary, AI tools have proved invaluable in the development of the prototype classification, but they are not entirely reliable and so (both manual and AI) controls and checks are needed at every stage to ensure that outputs conform to requirements.