AI Writing Assistant (Social Care)
An AI service to suggest wording for sections of a children's home inspection report based on the written evidence collected during its full inspection.
1. Summary
1 - Name
AI Writing Assistant (Social Care)
2 - Description
The project aims to drive efficiency in the inspection process by deploying a self-serve assistant for inspectors that creates suggested wording for 3 sections of a children’s home full inspection report based on written inspection evidence using a Large Language Model (LLM). The text output will be edited, reviewed, and quality assured by inspectors. It will form the first step in the inspection report drafting process.
3 - Website URL
N/A
4 - Contact email
Tier 2 - Owner and Responsibility
1.1 - Organisation or department
The Office for Standards in Education, Children’s Services and Skills
1.2 - Team
Data and Insight Team
1.3 - Senior responsible owner
Head of Regulatory Business Change, Regulation and Social Care Policy
1.4 - Third party involvement
No
1.4.1 - Third party
N/A
1.4.2 - Companies House Number
N/A
1.4.3 - Third party role
N/A
1.4.4 - Procurement procedure type
N/A
1.4.5 - Third party data access terms
N/A
Tier 2 - Description and Rationale
2.1 - Detailed description
Ofsted inspectors of children’s homes assess the quality of care, safety, and outcomes for children living in residential settings. They write reports based on inspection evidence to ensure transparency, hold providers accountable, and support improvements that promote the welfare and development of children in care.
Through a web interface these inspectors will provide a judgement alongside inspection evidence they gather in OneNote exported as a PDF. The web interface will connect to an API where Python will act as an intermediary for their text data to be processed by a Large Language Model (LLM) in OpenAI’s ChatGPT family. The LLM will work through a series of prompts for report drafting and house styling, generating sections of a draft report from the text data temporarily held in CPU memory (RAM).
The output does not contain the requirements section of the final report as this requires judgement and decision-making that must remain entirely with the inspector. The suggested wording for 3 sections of the report is output as a Word document that can be uploaded to our report management system Cygnum. At the top of the output, there is wording that outlines the text generated as a suggestion, and what the responsibilities of the inspector are. The suggested wording will be written into a draft, reviewed, and edited by the lead inspector as needed in line with new guidance, before continuing through quality assurance (QA).
2.2 - Benefits
Time: AI can instantly generate a structured first draft based on inspection evidence, eliminating the need to start from scratch and reducing time spent on formatting. Less time spent writing reports means more time for the evaluative process or more reports produced in the same amount of time.
Cost: Inspector’s time-savings can be expressed in financial terms but are non-cash-releasing.
Quality and consistency: A report’s first draft will be of greater quality as consistent styling and consideration for what makes a good draft report is applied universally. Improved quality and consistency will in turn increase clarity, ease of quality assurance, and trust.
2.3 - Previous process
The pre-existing process was for inspectors to manually write the first report draft. This first draft would then be extensively reviewed, quality assured, and edited by inspectors. Following the introduction of the report drafting tool, only after the inspection and decision-making does an LLM support in the first draft of writing a report, with the following quality assurance requiring further comparison to the inspection evidence than may have occurred otherwise.
2.4 - Alternatives considered
Whilst different LLMs have been considered based upon their costs and capabilities, this would have minimal impact on the end product. Our current approach is for a web interface to connect via an API to Python scripts which manage LLM calls.
Another option considered was to use Microsoft’s Power Apps with its integration to Azure AI Services. This offers low-code development benefits, however, in the discovery phase of development we found Power App’s limited customisation and compatibility challenging. Expertise within the development team is focussed on Python, offering further weight to the preferred solution.
Tier 2 - Deployment Context
3.1 - Integration into broader operational process
Ofsted inspects children’s homes to ensure that young people in care receive safe, high-quality support. Inspections may be routine or triggered by specific concerns. Inspectors gather observations and documentation during their visits to inform reports that assess the effectiveness of the home and support continuous improvement. These reports highlight strengths, identify areas for improvement, and guide action by providers and regulators.
The proposed LLM service would be integrated as the first step in the report drafting process by providing only suggested wording for select sections of the first draft using inspectors’ evidence base. Following this, the existing writing and QA processes continue, where iterative moderation, editing, and quality assurance will develop the draft into a suitable report setting out the inspection findings.
3.2 - Human review
Inspectors will be part of the testing process before deployment of this tool. Once in place, the lead inspector and staff involved in QA will remain the same as now, to author, assess the quality, and edit the final inspection report.
The text suggested by AI will be retained seperately through naming conventions and version controls in internal systems, creating clear distinction between it and inspector’s output for accountability and auditability.
3.3 - Frequency and scale of usage
Currently the tool is in development and has a limited number of test users (for evaluation purposes). No citizens interact with the tool.
If the service were live, it is anticipated it would run several hundred times per week.
3.4 - Required training
Training and written guidance produced alongside Policy, Operations, and Legal teams will be provided to children’s homes inspectors on how to use the tool effectively and appropriately.
Across training and written guidance will cover: - How to operate the tool - Decision ownership - Approved scope - Prohibited use - How the tool functions - Limitations of the tool - Quality assurance and accountability - Ensuring fairness and limiting bias - Further available training and support
3.5 - Appeals and review
The tool does not make judgments about the quality or effectiveness of a children’s home. The tool has no influence on inspection functions, it will assist inspectors with reporting functions alone. Its outputs are designed to support inspectors with suggested wording for select sections of an inspection report based on their evidence, helping them produce informed, evidence-based findings. Final decisions and evaluations remain the responsibility of the inspector, following established multi-layered quality assurance processes. Ofsted inspection reports for children’s homes can be challenged by providers through a formal complaints process.
More information is available here: https://www.gov.uk/government/publications/complain-about-ofsted
Tier 2 - Tool Specification
4.1.1 - System architecture
At present, the Python pipeline feeds a locally stored PDF document of text data into ChatGPT 4.1 Mini (an LLM by OpenAI) with some data cleaning to replace common acronyms with their full form. This runs through a series of prompts, each of which creates an output for a section of the report. These outputs are then collated in a Word document with a heading for each section. This document is then passed through a series of house styling prompts, applying rules to each section.
4.1.2 - System-level input
In internal testing, synthetic and historic children’s homes inspection evidence will be used. At the private beta stage, the tool would be on selected live inspections, but the output would not be published. If this service were to go live, the input would be present time children’s homes inspection evidence.
4.1.3 - System-level output
The service’s output is a Word document containing suggested wording for select sections of a children’s home full inspection report.
4.1.4 - Maintenance
Our intentions are to update the LLM architecture as better models become available and feasible, requiring updates to the prompts in turn. We would seek customer satisfaction data from inspectors on their usage of the service and perceived value to inform follow-up user testing and decisions on maintenance. We also plan to use real and synthetic data to measure the tool over time once deployed.
4.1.5 - Models
Report drafting ChatGPT 4.1 Mini model and house style drafting ChatGPT 4.1 Mini model (Internal Azure Instances).
Tier 2 - Model Specification
4.2.1. - Model name
ChatGPT 4.1 Mini (Internal Azure Instance)
4.2.2 - Model version
4.1
4.2.3 - Model task
The drafting of children’s homes full inspection reports based on the evidence collected during an inspection to free up inspector resource for other tasks.
4.2.4 - Model input
Inspection evidence recorded by Ofsted staff (children’s homes inspectors).
4.2.5 - Model output
A first draft Ofsted report with house styling applied.
4.2.6 - Model architecture
A Large Language Model (LLM) with a series of 3 report drafting prompts and further house styling prompts. Temperature is minimized. Output in JSON format.
4.2.7 - Model performance
Prior report drafts can be compared in quality to those produced by AI. Time to execute to completion is recorded as a key metric.
Better understanding the model’s performance is a key reason for ongoing testing. We would seek customer satisfaction data from inspectors on their usage of the tool and perceived value to inform follow-up user testing and decisions on how best to improve performance. We also plan to use real and synthetic data to measure the quality of the service over time once deployed.
4.2.8 - Datasets and their purposes
Synthetic inspection evidence produced by subject matter experts.
Historic inspection evidence for published reports dating back to April 2024 where approval has been granted by a responsible body.
2.4.3. Development Data
4.3.1 - Development data description
In addition to synthetic data, the development and testing of this tool has been supported by historic written evidence approved for piloting by responsible bodies and local authorities.
4.3.2 - Data modality
Text
4.3.3 - Data quantities
14 responsible bodies and selected local authorities have been invited to join a pilot using historical inspection evidence from children’s homes reports published since April 2024. This will be used for testing, with 20 cases evaluated.
4.3.4 - Sensitive attributes
Inspection evidence will contain sensitive personal details of those managing and running the home as well as life experiences of children living in the home. This includes but is not limited to matters of abuse, medical history, protected characteristics, protected characteristics, and criminal records.
4.3.5 - Data completeness and representativeness
Inspection evidence can vary in detail and style, however it will be captured in a set template, making it easier to process with AI.
4.3.6 - Data cleaning
We have a find and replace function within the Python script to identify acronyms in the inspection evidence and replace them with their full meaning before being processed by the LLM.
4.3.7 - Data collection
The evidence written by the inspector originates with anyone involved in an inspection – responsible individuals, registered managers, staff at settings, and children. They may speak to inspectors directly or complete surveys, be observed by inspectors, inspectors may look at documents that contain information about them.
4.3.8 - Data access and storage
Procedures determining the storage of and access permissions to raw inspection evidence (input data) are currently already in place and will not be stored by the tool. Input data is not used to train the model and will not be retained by the model.
The raw data will be used by the service to form draft inspection reports, but no input data will be retained by the service. The data newly created by this service (the drafted inspection reports) will be saved in Ofsted’s operational systems.
For the piloting phases of this work, no information will be entered onto publishing systems.
4.3.9 - Data sharing agreements
N/A
Tier 2 - Operational Data Specification
4.4.1 - Data sources
Children’s homes inspection evidence recorded by Ofsted staff (inspectors).
4.4.2 - Sensitive attributes
Children’s homes inspection evidence includes sensitive attributes such as special category personal data (e.g. health, ethnicity, sexual orientation) and potential proxies for protected characteristics. These attributes are processed by the AI tool solely for generating draft reports, with safeguards in place including role-based access, strict version control, and mandatory inspector review to identify and remove any inappropriate or identifying content before finalisation.
The secure storage of inspection evidence will be unchanged by the introduction of this tool. LLM itself does not retain any input data, and does not use any input data for training the model. Only the generated report is stored by the development team for monitoring and audit purposes. Report drafts generated during testing will be logged and retained internally for 6 months.
4.4.3 - Data processing methods
We have a find and replace function within the Python script to identify acronyms in the inspection evidence and replace them with their full meaning before being processed by the LLM.
4.4.4 - Data access and storage
Procedures determining the storage of and access permissions to raw inspection evidence (input data) are currently already in place and will not be stored by the tool. The raw data will be used by the service to form suggested wording for sections of an inspection report, but no input data will be retained by the service.
The data newly created by this service will be in plain text (not the final report format) in a Word document that must be saved in Ofsted’s operational systems (Cygnum) unedited as version 1 (V1). V2 will be produced by the inspector in the report format and contain changes made by the inspector to the suggested wording of the AI, in addition to the sections produced entirely by the inspector without AI assistance.
4.4.5 - Data sharing agreements
Reports can be shared with commissioning bodies for children in care.
Tier 2 - Risks, Mitigations and Impact Assessments
5.1 - Impact assessments
Data Protection Impact Assessment (DPIA) for testing phases completed on 06/08/2025. The DPIA identifies that testing the capability of AI to suggest wording for select sections of children’s homes inspection reports from synthetic and historic inspection data introduces suitably mitigateable risks. It concludes that while testing is legally permissible, deployment in live inspections will require robust safeguards, clear accountability, and demonstrable necessity to meet GDPR requirements. Another DPIA will be produced to assess the needs for a live product.
Equality Impact Assessment (EIA) completed on 23/06/2025. The EIA concludes that the AI assistant is not expected to negatively impact individuals with protected characteristics, with no equality impacts identified across all nine categories. It recommends continued monitoring through human-led quality assurance and stakeholder engagement as the tool progresses, ensuring compliance with the public sector equality duty.
We are committed to conducting periodic reviews of these impact assessments and this ATRS submission at each key stage of development and testing, to ensure continued relevance, compliance, and effectiveness.
5.2 - Risks and mitigations
-
Automated decision-making
Risk: AI-generated wording could influence regulatory findings, breaching GDPR Article 22.
Mitigation: The inspector will draft the report with the support of plain (not in the required report format) text suggested by the AI assistant for select sections of the report. We will maintain human-in-the-loop once the inspector has drafted their report through the existing detailed before publication. Guidance and training for inspectors is a key mitigation here by instructing inspectors how to use the tool appropriately to ensure that they remain the decision-maker. In addition, the tool requires inspectors enter their final judgement before they can generate a report, as to not shape that outcome. -
Legal basis
Risk: Insufficient justification for processing sensitive data with AI.
Mitigation: Develop a business case demonstrating necessity and interest, validate improvements through testing, and ensure inspectors remain the decision-maker through guidance and training. -
Data lifecycle
Risk: Unclear retention and deletion of AI outputs could expose sensitive information.
Mitigation: Implement retention/deletion policy, restrict access, and avoid inclusion in core systems during testing. -
Accountability
Risk: Unclear responsibility for AI outputs and risk of function creep.
Mitigation: Define roles and governance to limit use and avoid scope changes. -
Accuracy and bias
Risk: AI may produce inaccurate or biased text, leading to unfair outcomes.
Mitigation: Rigorous testing, mandatory QA, and guidance to counter automation bias. -
Security
Risk: Processing outside organisational domain introduces data security risks.
Mitigation: Conduct security assessment; ensure encryption and secure routing. -
Transparency
Risk: Stakeholders unaware of AI use in processing their data.
Mitigation: Provide clear information and opt-out during testing; maintain transparency for live use. -
Explainability
Risk: Inspectors may struggle to justify AI-generated wording.
Mitigation: Ensure fidelity to evidence and require explicit validation of each section. Through training and guidance ensure that the wording is a starting point and a suggestion, it is not to be the final wording. -
Societal and unintended impacts
Risk: Reduced trust, over-reliance on AI, and environmental impact.
Mitigation: Publish clear explanations, provide training, monitor usage, and adopt energy-efficient practices.