Appendix B: full methodology
Published 30 April 2026
Throughout the process to create the UK Standard Skills Classification (SSC), we relied heavily on the use of Artificial Intelligence (AI), particularly Large Language Models (LLMs). These were mostly OpenAI models (and generally the best available model at the time) but during early development we also used open-source Llama models for some of the more data intensive tasks that would have been too costly using OpenAI. At each stage, we manually reviewed exceptions and inspected the outputs from the AI models, sometimes having to check large amounts of data. During the development of SSC Version 1.0, we investigated the performance of GPT-5.4. Evaluation of importance score differences between GPT-5.4 outputs and the GPT-4.1-derived prototype outputs for a sample of mappings showed a noticeably more accurate performance by GPT-5.4. This led to the decision to regenerate all primary and secondary mappings using GPT-5.4 for SSC Version 1.0.
A central part of our AI approach was the use of text embeddings. These are numerical representations (vectors) of text that allow computers to understand the semantic meaning of a piece of text. Text embeddings are used in clustering text and to compare meanings of text strings. To understand how related 2 text strings are to each other, we calculate the distance between the vectors using cosine similarity. The larger the score, the closer the two text strings are in semantic meaning.
This method of comparing text strings was used extensively throughout the project. Early in the project, we experimented with different embeddings models and decided on the use of OpenAI 3-Large (OAL3) embeddings in the main. For the development of SSC Version 1.0, we re-checked the performance of OAL3 against newer embedding models (e.g. Qwen3) but it remained the best performing for our specific needs.
AI was also used in other ways in the project. We used AI prompts at many stages of the project, for example to quality assure skill and task statements and to detect inconsistencies and errors in mappings. Designing and refining prompts often involved several iterations as we learnt the best ways to interact with the AI to get the desired results.
The steps below outline the main stages in the creation of the SSC.
Tasks
Figure 8: Development of UK SSC Occupational Tasks
Figure 8 outlines the development process of the SSC Occupational Task library, detailing the main input libraries used, the data cleaning steps, and the validation against other information sources.
This is displayed as a series of processing steps from T1 to T6 in a row across the top of the diagram with each step shown below in a flow diagram. On the left-hand side are the 4 input libraries which feed into the first processing step: T1.
The ‘T’ prefix for each step ID (such as T1) relates to ‘Task’. For similar processes shared in later sections an ‘S’ prefix relates to ‘Skills’ and a ‘K’ prefix to ‘Knowledge’.
The input libraries from top to bottom are:
- the Graduate Futures Institute (GFI) responsibilities
- Skills England Occupational Standard duties
- the US Occupational Information Network (O*NET) tasks
- National Careers Service (NCS) day-to-day tasks
The processing steps show:
- T1 ‘validate as Task Statements’
- T2 ‘cluster by SOC SUG
- T3 ‘use AI to sub-cluster by meaning’
- T4 ‘use AI to merge and deduplicate’ which has arrows pointing to 2 steps under T5
- T5 ‘validate via SOC SUG description’ and ‘validate against job ads’ which both have arrows to step T6
- T6 ‘Occupational Tasks’
T1: Process and validate inputs
Task statement libraries were obtained from GFI (responsibilities), Skills England Occupational Standards (duties), O*NET (tasks), and National Careers Service (day-to-day tasks). These libraries were then cleaned and standardised using AI tools. AI tools were used to quality assure the tasks statements and correct tasks that were too generic, too specific, too wordy, incorrectly structured, compound tasks or not tasks. The quality assurance process also converted US spellings and phrasing to UK English.
T2 - T4: Refine, deduplicate and cluster
Text embeddings were generated using two models: OpenAI 3-Large and Bidirectional Encoder Representations from Transformers (BERT) MP-Net and then a variety of cluster models were tested and compared to remove duplicate and similar tasks. OpenAI 3-Large embeddings with a hierarchical clustering model was found to have the best results.
Clustered tasks were sorted by meaning (based on embeddings) to identify overlapping and close clusters and merged through manual inspection. Orphan clusters (those containing only one task) were integrated with multi-task clusters using results from other clustering and embeddings models.
The centroid task statement within each cluster was identified and became the task label. These cluster labels then became the initial version of the SSC Task library.
T5: Validate against other sources
SOC SUGs
Tasks were extracted from all SOC SUG descriptions (except n.e.c. groups ending /99) using the Llama3 LLM and then embeddings were created to enable matching to the SSC Tasks. The similarity between the SUG description task embeddings and the SSC Task embeddings was calculated to provide a numerical score representing the degree of similarity. The best matching SSC Task for each SUG description task was identified so that SSC Tasks were assigned to all relevant SOC SUGs. Potential Task to SUG matches were also identified via an analysis of existing job profiles such as those within O*NET where associated task statements appear in clusters used to derive SSC Tasks.
Further AI prompts were used to check the combined mappings and estimate the relatedness of these tasks to the associated SUGs. Significant discrepancies between a legacy mapping (such as a task match with a high level of importance within an O*NET profile but rejected by the AI analysis) were manually checked and reconciled.
Vacancy data
The IER holds a large vacancy database which is coded to SOC SUGs. A sample of distinct vacancy descriptions was created with a maximum size of 200 vacancies per SUG. The sample was selected from vacancies with longer job descriptions and those that were well coded to each SUG.
Llama3 was used to extract tasks from this database of vacancy descriptions and then the tasks were quality assured, clustered, and embeddings created using a similar process to the creation of the task library (T2-T4). These embeddings were then compared to the SSC Task embeddings. Vacancy tasks that were quality assured as being ‘good’ tasks but had a low similarity score to an existing SSC Task were manually inspected to identify any tasks that should be added to the SSC Task library.
The database of vacancy tasks was also used to identify additional tasks for SUGs with no or low numbers of associated tasks and similarly for SSC Skills with no linked SSC Tasks.
T6: Final SSC Occupational Tasks
The final list of SSC Tasks consists of 22,583 tasks. This is based primarily on the SSC prototype version but following analysis of tasks added to O*NET, Skills England Occupational Standards (i.e. duty statements) and GFI responsibilities since the original task library was created, an extra 692 tasks were added. This extended library was then mapped to SSC Skills and Knowledge concepts and to occupations.
Skills
Figure 9: Development of UK SSC Occupational Skills
Figure 9 shows the equivalent process for the construction of the hierarchical classification of SSC Occupational Skills together with a set of 13 Core Skills.
This is displayed as a series of processing steps from S1 to S7 in a row across the top of the diagram with each step shown below in a flow diagram. On the left-hand side are the 6 input libraries which feed into the first processing step: S1.
S1 to S7 refer to each processing step in the creation of the Occupational Skills library.
The input libraries from top to bottom are:
- European Skills, Competences, Qualifications and Occupations (ESCO) Level 4 skills
- the National Careers Service (NCS) skills
- O*NET Detailed Work Activities (DWAs)
- Skills England Occupational Standards skills
- GFI skills
- the Workforce Foresighting Hub, Innovate UK (WFH) skills
The processing steps show:
- S1 ‘validate as Skills’
- S2 ‘cluster by meaning’
- S3 ‘use AI to merge and deduplicate’
- S4 ‘map against SOC SUGs’ which has arrows pointing to 2 steps under S5
- S5 ‘validate against Tasks’ and ‘validate against job ads’ which both have arrows to step S6
- S6 ‘Occupational Skills’
- S7 ‘Core Skills’
S1: Process and validate inputs
Skills statement libraries were obtained from GFI (skills), ESCO (Level 4 skills), Skills England (skills), Innovate UK Workforce Foresighting Hub (skills), O*NET (Detailed Work Activities) and the National Careers Service (skills). These libraries were cleaned and standardised using AI tools. AI tools were again used to quality assure the skill statements and correct any that were too generic, incorrectly structured, compound, invalid, elementary, ambiguous, traversal, or too specific.
The text below shows an example of a prompt used to quality assure skill statements:
prompt_text=”””
A good occupational skill label complies with all of the following criteria:
1. It describes a skill that requires significant training and practice to acquire.
2. It describes a skill and not an attitude or outcome. For example, ‘maintaining a positive outlook’ or ‘Ensuring customer satisfaction’ would therefore not qualify as occupational skills.
3. It describes a skill that is developed and not innate. For example, ‘a good sense of smell’ is not a skill although “Smelling foods and ingredients to evaluate quality” is.
4. It begins with an action-based verb followed by a specific noun (i.e. describes something being actively done to an object).
5. It is no more than nine words long (and ideally between three and six).
6. It is unambiguous (i.e. it describes a specific skill and couldn’t be misinterpreted as something else)
7. It describes a specialist skill and therefore is only relevant to a subset of jobs. For example, “supervise workers” is too broad
8. It describes a skill that is broad enough to be relevant to or transferable between multiple jobs but not overly generic
Examples of good occupational skill labels include:
1. Install heat pumps
2. Administer standardised psychological tests
3. Manage software development projects
4. Read musical scores
5. Inspect aircraft to check airworthiness
6. Design relational database schemas
Quality Evaluation Category Codes, Category Names & Rewriting Guidance:
For evaluation and, where necessary, editing, occupational skill labels can be classified into one or more of the following categories:
- Good - This label meets all the criteria
- Compound - This describes multiple skills. It needs to be split into multiple skill labels, one per different skill.
- Too Generic - This is too generic and isn’t describing a specific skill.
- Invalid - This does not describe a skill and is instead a tool, subject, attitude or outcome. It needs to be removed.
- Too Complex – The vocabulary used to define the skill is verbose and unnecessarily difficult to read. It needs to be simplified.
- Disordered – This label does not follow the verb-noun sequential format. It needs to be rewritten to present the information in this order.
- Elementary - This is an unskilled or very low-skilled activity
- Ambiguous - This label could represent two totally different skills
- Traversal - This is a skill that is very broad and is required in a wide variety of unrelated job role
- Too Specific - This is a skill that is too specialised and only relevant to a specific part of one job
Quality Evaluation Category Examples:
Examples of skill labels assigned to the various evaluation categories (some examples may belong to more than one category)
- Good – “Administer standardised psychological tests.”
- Compound – “Design, administer & interpret standardised psychological tests.”
- Too Generic – “Analyse data.”
- Invalid – “Stay positive.”
- Too Complex – “Apply research ethics and scientific integrity principles in research activities.”
- Disordered – “Safe working Practices: Meet legal, industry and organisational requirements.”
- Elementary - “Fill kettle with water” or “Pass dental instruments.”
- Ambiguous - “Conduct pipeline analysis” (this is ambiguous as it could refer to an oil or data pipeline)
- Traversal - “Think analytically”
- Too Specific - “Repair vehicles with fuel-injection problems”
With this context, please evaluate the occupational skill labels in the provided list of tuples (containing the statement_id and statement_text) and assign each one to one or more of the of the Evaluation Category codes.
Next step:
Rewrite each statement by applying the rewriting guidance for all of its category codes as well as using the original criteria for good occupational skill labels and examples of good skill labels provided.
For example, a code 2 (Compound) statement should be split into two distinct skill labels.
If the original statement does not contain enough information to apply the guidance properly then instead assign a label “Insufficient content to rewrite”.
Finally, return a json list of dictionaries (one dictionary per record) containing (in the following order):
- Statement_id:
- Statement_text:
- Evaluation_categories: A comma separated list of the Evaluation Category codes and their corresponding names
- Statement_refined: The rewritten statement or statements or the label “Insufficient content” (*If there is more than one statement, these should be separated by the “#” character.)
”””
Please note that this prompt was developed in May 2024 and used with the LLM model OpenAI GPT-4o. Current LLMs are significantly more capable and the prompt could be improved (quite possibly by an LLM) to produce better results. Use of this exact prompt is therefore not recommended.
S2 - S3: Refine, deduplicate and cluster
OpenAI 3-Large embeddings were created and a hierarchical clustering model was used to deduplicate and refine the library of skills.
Skill clusters were then sorted by meaning to identify overlapping clusters and these were manually inspected for inclusion or deletion. AI prompts were used to analyse the consistency of the skill clusters and generate a new skill label to best describe the cluster of skills (rather than using the centroid skill as the label).
The verbs in the skill labels were standardised and became the SSC Skills.
AI tools were used to write a description of the SSC Skill label and then a further prompt identified any ambiguous skills labels and descriptions which were rewritten.
Create Skill Groups, Areas and Domains
The SSC Skills were clustered to create Skill Groups and parent or child overlaps were manually checked. An AI prompt was used to check the SSC Skills within each Skill Group and identify any overlapping Skill Groups.
The Skill Groups were then clustered to create Skill Areas and the language of the Skill Groups and Skill Areas was standardised. An AI prompt was used to check the skills in each Skill Area and return a skill relatedness score.
The Skill Areas were then mapped to Skill Domains and an AI prompt used to check SSC skills within Skill Domains.
S4: Map against SOC SUGs
The original prototype mapping from SOC SUGs to SSC Skills was based primarily on the occupational mappings in the input skill libraries.
The final Version 1.0 mapping was however entirely regenerated by first identifying potential matches from a text embedding comparison of the new Version 6 SOC SUG titles and descriptions against SSC skills and descriptions. The potential match lists were then extended by the addition of any of the top 30 skill matches from the original mapping not already included.
The original SUG to skill mapping contained only a single weighted importance score but, even with the latest LLMs (e.g. GPT-5.4), prompts to generate similar importance scores were inconsistent (i.e. running the same prompt against the same dataset would generate significantly different scores). This aligned with broader concerns about prototype mapping score accuracy. SUGs are quite broad occupational concepts and skills within some related roles may be very important while unrelated to others. For example, different application developer roles will involve using different programming languages and libraries which, in turn, will need different skills. A prompt evaluating the requirement ‘probability’ for a skill within an SUG, and then the percentage ‘importance to competence’ separately led to the generation of significantly more consistent scores. Moreover, these were significantly more correlated with a sample of independent importance evaluations of the existing SSC skill matches than the original mapping.
The augmented potential match lists were therefore evaluated using the AI prompt format below to generate these two distinct estimates of relatedness. Matches with a frequency score below 10 and an average weighted score of below 25 were typically excluded, although some were retained to improve overall coverage (see “The UK Standard Skills Classification” for details).
Example prompt:
“””
You are a skills analyst and need to evaluate the importance of skills within a list to a specific UK occupation.
To do this you will be given a list object that contains:
1) An occupation_id
2) An occupation_title and description (hyphen separated e.g. ‘Chemical engineers - Chemical engineers design and develop large scale chemical and physical production processes.’)
3) A list of ; separated tuples containing a skill_id and a hyphen-separated skill_label and skill_description
For example:
[1132/02,’Sales directors - Sales directors are responsible for overseeing all sales operations for an organisation or business.’,(S.2978;Supervise sales staff - Set daily priorities, monitor calls and deals, coach staff on products, and review progress against targets.);(S.0106;Analyse sales data - Analyse sales figures to find trends by product, customer or region and spot issues affecting revenue.);(S.2862;Set sales targets - Set measurable sales targets based on past results and forecasts, such as revenue, units sold or new customers.)]
For each of these occupation_skill lists please evaluate each skill and then:
1) Assign a % probability (as an integer value) that in a UK context the skill would be required for roles within the occupation described (for example, an application developer role would be more likely to require the skill of Python programming as a skill rather than Scala or Rust). Remember this score as the skill_required_probability_percentage..
2) Assign a % score (as an integer value) to indicate the importance of competence in that skill to overall competence of roles belonging to that occupation and requiring that skill (For example an application developer role requiring Django). Remember this as the skill_importance_if_required
3) Don’t include any rationale, return only a json list of dictionaries (one dictionary per occupation-skill pair) containing (in the following order):
a) occupation_id:
b) skill_id:
c) skill_required_probability_percentage:
d) skill_importance_if_required
“””
Weighted frequencies were then calculated to show how SUGs relate to SSC Skill Groups and SSC Skill Areas.
S5: Validate against other sources
SSC Tasks
The SSC Skills embeddings were compared to SSC Tasks embeddings to identify links between them. This mapping was then checked using an AI prompt and a further prompt defined the importance score of the SSC Skill to the SSC Task.
Vacancy data
Following a similar process to the validation of tasks using vacancy data, skills were extracted from a sample of vacancy descriptions using Llama3. These were quality assured using AI and then embeddings were created and the vacancy skills were clustered within each SUG and then across all SUGs. The centroid embedding within each cluster became the vacancy skill label. These were then compared to the SSC Skills embeddings to check coverage and any vacancy skills quality assured as being of good quality with a low similarity score to the SSC Skills were inspected for inclusion. This resulted in eight new concepts (e.g. S.1388 - Install EV charging points) being added to the prototype classification.
S6: Final SSC Occupational Skills
The set of SSC Skills consists of a hierarchy of 3,350 Occupational Skills, 607 Skill Groups, 106 Skill Areas and 22 Skill Domains. This is based primarily on the prototype classification but, following user feedback and evaluation of pilot outputs, 10 new occupational skills were added, 27 skill labels modified (e.g. ‘S.0271 - Build axed arches and haunch brickwork’ changed to ‘Build arches and angled brickwork’ to improve clarity) and two redundant concepts removed. All occupational skills descriptions were also revised using GPT-5.4 to improve consistency and readability. The datafile changelog contains full details.
S7: Core skills
The Skills Builder Partnership essential skill concepts were considered and then a list of 13 SSC Core Skills and definitions were drawn up.
AI prompts were used to help create definitions for each of the 5 skill levels of each SSC Core Skill and then to evaluate the level of Core Skill proficiency in each SSC Skill and each SOC SUG. Several AI models were used in this step to try to attain the best and most consistent results.
Knowledge
Figure 10: Development of UK SSC Knowledge concepts
Figure 10 illustrates the process to develop the SSC library of Knowledge concepts.
K1 to K6 refer to each processing step in the creation of the Occupational Knowledge library.
The main input libraries from top to bottom are:
- ESCO (European Skills, Competences, Qualifications and Occupations) Knowledge concepts
- Higher Education Coding of Subjects (HECoS)
- Learn Direct Classification of Subject Codes (LDCSC)
- O*NET (knowledge, tools used and technology skills)
- Stack Exchange (topic tags)
- Wikipedia (article titles)
The processing steps show:
- K1 ‘validate as Knowledge concepts’
- K2 ‘cluster by meaning’
- K3 ‘use AI to merge and deduplicate’ which has arrows pointing to 5 steps under K4
- K4 ‘validate versus Ofqual’, ‘validate versus Skills England’, ‘validate versus tasks’, ‘validate versus job ads’, and ‘validate versus prototype’ which each have arrows to step K5
- K5 ‘identify primary concepts’
- K6 ‘Occupational Knowledge’
The Knowledge concept, subject and topic names were collected from the input libraries.
K1: Process and validate inputs
Knowledge libraries were obtained from ESCO (Knowledge), HECoS (Higher Education Coding of Subjects), LDCSC (Learn Direct Classification of Subject Codes), O*NET (Knowledge, Tools Used and Technology Skills), Stack Exchange (Topic Tags) and Wikipedia (Article Titles). These were cleaned and standardised using AI tools. The list of Knowledge concepts was checked for any matching or equivalent terms and then filtered to only include concepts that were evident within a UK context.
K2 - K3: Refine, deduplicate and cluster
Knowledge concepts were clustered by meaning using embeddings and further deduplicated using clustering methods.
K4: Validate against other sources
Ofqual
Up to 50 potential matches per qualification were identified by comparing a text embedding vector of a concatenated text string of each qualification title and its associated qualification units against a text embedding for each SSC Knowledge concept label.
Text embedding vectors were generated using the OpenAI 3-Large Model with a cosine-similarity match threshold of 0.3 being applied. Matches above this threshold were then evaluated by prompting an LLM (GPT-5.4) with a simplified text string for each qualification (its simplified title and up to 5 example qualification units) to validate each match and also, where appropriate, assign a percentage probability that “a significant amount of knowledge in that area would be learnt by achieving that qualification”. Following a sample inspection, matches assigned a match probability score below 50% were rejected.
The closest Sector Subject Areas were identified using embedding matches and validated using an LLM prompt and manual inspection.
IfATE and Skills England
Up to 10 potential matches per Occupational Standard Knowledge statement were identified by comparing a text embedding vector of a concatenated text string of each statement and its associated occupational standard against a text embedding for each SSC Knowledge concept label. Text embedding vectors were generated using the OpenAI 3-Large Model with a cosine-similarity match threshold of 0.3 being applied. Matches above this threshold were then evaluated by prompting an LLM (GPT-5.4) and, where appropriate, assign a percentage importance of the knowledge to that statement. Following a sample inspection, matches assigned a probability score below 50% were rejected.
SSC Tasks
Embeddings matches were also used to assign SSC Knowledge concepts to SSC Tasks and then an AI prompt checked whether the Knowledge concepts had been correctly assigned to Tasks. An AI prompt was then used to define the importance score of the Knowledge to the Task.
Vacancy data
The sample of vacancy descriptions was searched for the SSC Knowledge concepts to check that they are all terms in common usage.
K5: Primary concepts
The primary concept type and potentially related concepts were identified using embeddings matches and checked using LLM prompts.
K6: Final SSC Occupational Knowledge concepts
The final set of SSC Knowledge concepts consists of 5,056 concepts linked to SSC Tasks, SSC Skills and subjects. This is based primarily on the prototype classification but, following user feedback, evaluation of pilot outputs and a re-analysis of previously excluded terms, 145 new concepts were added, 10 concept labels were modified to improve clarity (e.g. ‘K.0663 – Casting’ changed to ‘Casting (Manufacturing)’) and 15 redundant concepts were removed. All concept descriptions were also revised using GPT-5.4 to improve consistency and readability. The datafile changelog contains full details.
Secondary mappings
Secondary mappings to existing classifications of Skills, Tasks and Knowledge concepts were created using embeddings matches. The full list of secondary mappings available can be found in Appendix A.
Skill categorisations
1. Numeracy skills and Digital skills
These classifications were created using the SSC Skills that were rated as requiring an expert level of proficiency in the SSC Core Skills of Numeracy and Digital Literacy.
2. Green skills
An AI prompt was used to score each SSC Skill in how related (directly or indirectly) it is to the UK’s net zero emissions target and other environmental goals. Using previous work to define the Green SOC (see Warhurst, C., Harris, J., Cardenas-Rubio, J. C., and Anderson, P. (2025). A just transition?: Green jobs, good jobs and labour market inclusivity in Scotland. European Journal of Workplace Innovation, 9(1-2), 63-79.), the skills mapped to green SUGs were also identified. A manual inspection of the skills with high AI green scores and those mapped to green SUGs was then carried out to identify a list of Green and Green enabling skills.
3. STEM-M&H (Science, Technology and Engineering, Mathematics, Medicine and Health) skills
The definitions of STEM-M&H used to define SUGs as STEM-M&H in previous work for The Royal Society were used in an AI prompt to score the SSC skills against each of the four categories. After a manual inspection and comparison to the STEM-M&H SUGs linked to each skill, a threshold score was applied to define the STEM-M&H category.
4. Artificial Intelligence (AI) skills
A model was developed to define 4 different categories of AI skills, as listed in Table 5 below.
Table 5: AI Skills Categories within the UK SSC
| AI Skill Category Name | AI Skill Category Description |
|---|---|
| AI Development Skills | Technical skills that help develop, implement and maintain Artificial Intelligence (AI) tools and capabilities. |
| AI Operation Skills | Skills that directly relate to the use of Artificial Intelligence (AI) tools and capabilities. |
| AI-Augmented Skills | Skills that can be performed without AI tools and capabilities but can be materially simplified, accelerated, improved or scaled through their use. |
| AI Oversight Skills | Skills that help plan, govern, monitor, audit, assure, validate, regulate, approve, or oversee the safe, lawful, ethical, effective, or responsible use of Artificial Intelligence (AI) tools and capabilities. |
An AI prompt was used to assign a percentage score for each of the categories for each SSC Skill. For scores above 50% a rationale was also generated for validation purposes.