© Crown copyright 2018
This publication is licensed under the terms of the Open Government Licence v3.0 except where otherwise stated. To view this licence, visit nationalarchives.gov.uk/doc/open-government-licence/version/3 or write to the Information Policy Team, The National Archives, Kew, London TW9 4DU, or email: firstname.lastname@example.org.
Where we have identified any third party copyright information you will need to obtain permission from the copyright holders concerned.
This publication is available at https://www.gov.uk/government/publications/quality-assurance-of-administrative-data-in-the-uk-house-price-index/regulated-mortgage-survey-data-provided-by-the-council-of-mortgage-lenders
This is the Quality assurance of administrative data (QAAD) sources for Regulated Mortgage Survey data used in the production of the UK House Price Index (UK HPI).
Council of Mortgage Lenders website
The UK House Price Index (HPI) measures the change in the price paid to purchase residential property in the United Kingdom. A number of different administrative datasets are used in the production of the monthly House Price Index (HPI) using a technique known as hedonic regression. In simple terms, hedonic regression is a technique which accounts for the changing quality of property transacted each period to isolate only pure price change, so that the change in price is not distorted by differences in the composition of property sold (for example, you cannot directly compare the price of a one bedroom property sold in one period with a three bedroom property sold in another).
The hedonic regression approach requires detailed information on the characteristics of property sold, both regarding the physical attributes of the property (such as size, floor space) and the location of the property (what type of neighbourhood, where in the country for example). For the production of the UK HPI this data is obtained from a variety of administrative data sources that cover the price paid for transacted property (such as the Price Paid Dataset collected by HM Land Registry for England and Wales), the attributes of a property (such as the Council Tax Valuation List maintained by the Valuation Office Agency) and characteristics related to the location of the property (such as the type of neighbourhood where the property is situated, defined by the ACORN classification from Consolidated Analysis Centers, Inc. (CACI)).
This document will focus on data from the Regulated Mortgage Survey (RMS) provided to us through the Council of Mortgage Lenders. This data is not used as a price determining characteristic within the UK HPI model each month. Rather, it is a data source used in constructing the weights in order to allow a first time buyer and former owner occupier split.
2. Summary of process
The Regulated Mortgage Survey data is used in constructing the UK HPI weights which are updated on an annual basis. This RMS data is matched to other data sources (such as that of HM Land Registry) at an address level. This allows the buyer status of that record to be identified. The buyer status of unmatched mortgage records, are imputed using nearest neighbour imputation. It is assumed that cash purchases are all purchases by former owner-occupiers.
3. Assessment of the CACI Acorn data using the Administrative Data Quality Assurance Toolkit
The production and publication of house price data can be considered as medium profile, in that there is wider user and media interest in the results that are published, with moderate economic or political sensitivity.
Data from the Regulated Mortgage Survey, provided to ONS by the Council of Mortgage Lenders is not used as a price determining characteristic within the UK HPI model each month. Rather, it is a data source used in constructing the weights in order to allow a first time buyer and former owner occupier split. There is good appreciation of the context in which the data are collected and the quality standards applied to the data meet our statistical requirement. As such, the level of risk of quality concern is classified as low. This means a basic to enhanced level of assurance is required for this data source.
3.1 Practice area 1: operational context and administrative data collection
The Regulated Mortgage Survey (RMS) is the Council of Mortgage Lender’s version of the Mortgage Product Sales Data (PSD) that all regulated lenders report to the Financial Conduct Authority (FCA). This is detailed transaction level data on mortgage completions. Starting in April 2005, the RMS now contains over 12 million individual mortgage sale records.
It is collected electronically, with all reporting firms submitting according to FCA defined data definitions, and in a standard xml format, ensuring consistent data structure. Data coverage is estimated to be around 95% of all regulated mortgages currently advanced. However, some lenders do not permit their data to be transferred to third parties, like ONS, so the share of the market covered by data received by ONS is about 70%. The estimated impact of this is minimal as the distribution of the data received is similar to all purchases through a mortgage as registered by HM Land Registry.
Reporting fields include purchase price, completion date of property sale, type of borrower (first time buyer (FTB) or home mover) new or second hand property, type and size of dwelling. The RMS is the only comprehensive source of data available (which we have access to) for the type of borrower and provides the necessary data to allow the new HPI to be produced according to whether the buyer is a first time buyer or an existing owner.
As noted, at present ONS receive the Regulated Mortgage Survey data through the Council of Mortgage Lenders. ONS will be actively seeking direct access to the same data through the Financial Conduct Authority (FCA) using powers of the Digital Economy Act. This may allow the coverage of data received to be increased.
3.2 Practice area 2: communication with data supply partners
A contract is in place between ONS and the Council of Mortgage Lenders for the provision of a monthly feed of transaction level Regulated Mortgage Survey data. Data requirements, data transfer process, data protection and a delivery schedule is included within this. This contract is in place until the end of 2018, with the option of an additional two years.
Ad hoc meetings are held when specific issues need addressing. Email correspondence and telephone discussions are used for other purposes, such as changes to the format of the data received.
3.3 Practice area 3: quality assurance principles, standards and checks applied to data supplies
The Regulated Mortgage Survey is an exact copy of the FCA’s Product Sales Data requirements. Several layers of validation are applied to the inputted data. Data format checks are applied at point of entry, files not confirming to this pre-set formal are rejected.
A second level of validation is applied to the data once it has been uploaded on to the system. For example, the variable ‘Borrower type’, entry must be one of;
- “F” = First time buyer
- “M” = Home mover (2nd or subsequent buyers)
- “R” = Remortgagers
- “C” = Council/registered social landlord tenant exercising their right to buy
- “O” = Other
Incorrect fields are either; automatically corrected, manually corrected or set to null.High low checks on variables such as property purchase price and income of individual are also applied. Each month, error reports summarising the number, nature and severity of error is produced for each firm. These errors are then raised directly with the firm in question.
Further information on the validation applied can be found in the Financial Conduct Authority (FCA)’s data reference guides.
3.4 Practice area 4: producers quality assurance investigation and documentation
Validation is applied to each monthly dataset received from the Council of Mortgage Lenders. Various procedures are in place to ensure that errors in the monthly data received are minimised:
- validation checks on data, based on extreme values are conducted to highlight potential unusual prices
- data cleaning is done to remove cases with missing data and erroneous data
- the minimum and maximum values for house price, mortgage advance and total income are investigated and if they are deemed to be suspect, they are removed from the dataset
- the top and bottom 10 house price outliers are validated against external sources
- ratio analysis (comparing house price to total income and mortgage advance to total income) is then carried out to check for consistency
Cleaned monthly files are then combined to construct an annual dataset which is then matched to other data sources by postcode and associated address information.
Details on the UK HPI production methodology are available.
4. Strengths and limitations of data
The RMS is the only comprehensive source of data available for the type of borrower and provides the necessary data to allow the new HPI to be produced according to whether the buyer is a first time buyer or an existing owner.
However, there are some acknowledged limitations in the data:
This data source is based on mortgage data only, as such cash sales are excluded. Due to data sharing restrictions, the share of the mortgage data received by ONS is around 70%. As noted in this document, ONS will be actively seeking direct access to the same data through the Financial Conduct Authority (FCA) using powers of the Digital Economy Bill. This may allow the coverage of data received to be increased.
Overall, we consider the Regulated Mortgage Survey data a good data source for the purpose it is being used.