HMRC: VAT Return Analysis Tool
This tool detects anomalous values within a trader's VAT Return history
1. Summary
1 - Name
VAT Return Analysis Tool
2 - Description
The VAT Return Analysis Tool presents VAT data to a VAT officer. This VAT data includes Trader, Ledger, and Return information over a trader’s VAT History. It is used to support VAT compliance checks.
3 - Website URL
N/A
4 - Contact email
Tier 2 - Owner and Responsibility
1.1 - Organisation or department
HM Revenue and Customs
1.2 - Team
Data Science Analytics
1.3 - Senior responsible owner
Principal Data Scientist
1.4 - Third party involvement
No
1.4.1 - Third party
N/A
1.4.2 - Companies House Number
N/A
1.4.3 - Third party role
N/A
1.4.4 - Procurement procedure type
N/A
1.4.5 - Third party data access terms
N/A
Tier 2 - Description and Rationale
2.1 - Detailed description
The VAT Return Analysis Tool displays VAT Head of Duty system data to VAT officers. This includes information relating to the trader’s VAT entity, data on charges and payments received to the VAT account, and data relating to the VAT returns. Officers use this information to support ongoing compliance checks. The scope is limited to the most recent seven years of VAT data.
2.2 - Benefits
The tool reduces the time required by VAT officers to gather VAT data by displaying it in one tool instead of being collated from multiple systems.
2.3 - Previous process
This tool replaces an earlier VAT Return Analysis Tool which produced reports in Excel of similarly gathered data. This tool is updated to provide interactive elements and is designed for current Head of Duty system data (rather than legacy data).
2.4 - Alternatives considered
N/A
Tier 2 - Deployment Context
3.1 - Integration into broader operational process
VAT officers use the tool to inform compliance checks by displaying relevant VAT data to a VAT officer. It does not automate decisions but better enables officers to scrutinise and analyse VAT information relating to a particular business.
3.2 - Human review
The output supplements wider investigations into VAT registered businesses and is reviewed by officers on an ongoing basis during the course of any compliance activity. Outputs will regularly be compared to Head of Duty systems to validate quality. No decisions are made by the tool, but outputs support officers in making decisions as part of compliance checks.
3.3 - Frequency and scale of usage
Around 5,500 officers have a license to use the tool as part of their VAT compliance work. There are ~1,500 daily uses of the tool.
3.4 - Required training
VAT officers learn to use the tool during compliance training, and there are additional resources provided on how to use the tool and interpret the outputs.
3.5 - Appeals and review
In respect of compliance work, the technology only helps identify VAT returns that may warrant further scrutiny, but the decision ultimately sits with the investigating VAT officer as to what compliance checks are carried out. Any decisions that impact the taxpayer, such as an assessment for tax due, will be made by the VAT officer and the normal appeals and reviews processes will apply (https://www.gov.uk/tax-appeals/review-of-a-tax-or-penalty-decision).
Tier 2 - Tool Specification
4.1.1 - System architecture
Users access the tool through an interactive web application hosted internally on a POSIT Connect instance. The web application uses a client-server model where the user interface is delivered by web browser and computation occurs server-side on the Connect instance. Users select VAT traders through the interface; the VAT traders data is validated and processed in R using Shiny server logic to return outputs. Returns data is refreshed daily and each session is user-specific.
4.1.2 - System-level input
A VAT Officer inputs a VAT Registration Number to trigger the return of tabular VAT data related to the VAT trader.
4.1.3 - System-level output
A number of interactive tables and graphs are displayed, along with optional functionality to download static data.
4.1.4 - Maintenance
VAT data is refreshed and monitored daily, computations are processed independently during each session or are triggered by user input.
4.1.5 - Models
This tool uses the classical statistical Seasonal-Trend decomposition using Loess (STL) model combined with interquartile range (IQR) to identify outliers in VAT Return data.
Tier 2 - Model Specification
4.2.1. - Model name
Anomaly Detection.
4.2.2 - Model version
2.3.1
4.2.3 - Model task
Identify anomalous values in VAT return data.
4.2.4 - Model input
Tabular VAT return data.
4.2.5 - Model output
Scatterplot output of expected values, and observed (including anomalous) values for each return period.
4.2.6 - Model architecture
The Anomaly Detection use the timetk package in R (https://github.com/business-science/timetk) to identify anomalies in return data using seasonal-trend decomposition via Loess (STL). Its primary purpose is to detect anomalous patterns in time series returns which may highlight irregularities or shifts in return behaviour that could require further investigation.
4.2.7 - Model performance
This model was trained, tested, and validated by the developers of the timetk package & its anomaly detection capabilities were found to be suitable for the purposes of the Returns Analysis Tool via User Acceptance Testing (UAT) and comparison with other models to confirm performance .
4.2.8 - Datasets and their purposes
The Anomaly Detection from R’s timetk package is trained, tested, and validated by its developers. Anomaly detection of this sort does not require a model trained on bespoke data.
2.4.3. Development Data
4.3.1 - Development data description
N/A
4.3.2 - Data modality
N/A
4.3.3 - Data quantities
N/A
4.3.4 - Sensitive attributes
N/A
4.3.5 - Data completeness and representativeness
N/A
4.3.6 - Data cleaning
N/A
4.3.7 - Data collection
N/A
4.3.8 - Data access and storage
N/A
4.3.9 - Data sharing agreements
N/A
Tier 2 - Operational Data Specification
4.4.1 - Data sources
User input determines what VAT traders data to pull. VAT data is then pulled from a data warehouse holding the VAT Head of Duty system.
4.4.2 - Sensitive attributes
There are no protected characteristics in the datasets. Trader data contains the VAT Registration Number, organisation name, and principal place of business address for the VAT Trader. However these relate to the business entity rather than an individual. Where the VAT data relates to a sole trader, the data may contain personally identifiable data, however the data is only used during lawful processing and is accessible only to those with the business need and security clearance to view it.
4.4.3 - Data processing methods
The tool does not alter Head of Duty data before presenting it; rather, there is an agreement that data quality issues will be resolved in the Head of Duty system.The data is refreshed daily, ensuring these resolutions will be made quickly available in the Returns Analysis Tool.
Return data combines legacy and current VAT return data to provide a single view of return data history. This ensures there is not duplication of VAT periods and returns across two systems and prioritises the current Head of Duty system data. For anomaly detection, return data is validated to ensure there are sufficient data points to identify anomalies, and doesn’t process or produce outputs if there are not enough data points.
4.4.4 - Data access and storage
User interaction logs are kept to understand which VAT Traders were selected for processing and what data was downloaded by users. These logs are only accessible through strict access controls.
4.4.5 - Data sharing agreements
N/A
Tier 2 - Risks, Mitigations and Impact Assessments
5.1 - Impact assessments
Data Protection Impact Assessment (DPIA) published 18/02/2021 and reviewed annually. The DPIA was approved by internal Security and Information Business Partners finding lawful processing of VAT data.
Load Testing (23/10/2024) assessed that the tool was suitable for scalability across internal HMRC teams after running multiple scenarios of simultaneous sessions and monitoring the host CPU and Memory availability.
5.2 - Risks and mitigations
- Risk of unauthorised access to the tool is mitigated by internally controlled service requests to the POSIT Connect host, and further access control to the web application. Risk is low impact and low probability.
- Risk of incorrect representation of VAT data is considered low impact and low probability as data refresh process is monitored daily and the tool either displays untransformed Head of Duty system data or uses established classical statistical methods that are understood.
- Risk of false positives/negatives in the anomaly detection model are mitigated by an escalation of outputs process which requires extensive user review before decisions can be made that are influenced by the tool.
- Risks of data sufficiency across the tool are mitigated by validation checks on the appropriateness of application (minimum viable data points).