Welsh Government: Mapping Ancient Woodland Image Segmentation Model

Uses an image segmentation model to predict areas of woodland on OS Landmark maps using a sample of manually labelled data.

1 - Name

Mapping Ancient Woodland Image Segmentation Model

2 - Description

Using computer vision, an AI technology that works with images, to find areas of historic woodland on old maps. By comparing these maps with today’s data, we can find places where woods have been cut down in the last 150 years. These areas are ideal for re-planting because the soil supports tree growth due to an established seed bank and ecosystem. The data will also be published on DataMapWales for others to use.

3 - Website URL

https://digitalanddata.blog.gov.wales/2024/09/09/can-ai-help-find-lost-woodland/

4 - Contact email

customerhelp@gov.wales cymorth@llyw.cymru

Tier 2 - Owner and Responsibility

1.1 - Organisation or department

Welsh Government

1.2 - Team

Data Science Unit

1.3 - Senior responsible owner

Head of Data and Geography

1.4 - Third party involvement

No

1.4.1 - Third party

N/A

1.4.2 - Companies House Number

N/A

1.4.3 - Third party role

N/A

1.4.4 - Procurement procedure type

N/A

1.4.5 - Third party data access terms

N/A

Tier 2 - Description and Rationale

2.1 - Detailed description

This tool takes OS Historic Landmark 1:2500 data (typically dated late 1800s), saved in a .geotif file format and produces geometric objects which represents the location of areas of woodland on each map, based on the symbols used to depict different land classifications such as “Wood”, “Fir”, “Orchard”, “Bush” etc.

The first step is to take the 13 county landmark data and process the files into 200m x 200m tiles. A representative sample of 1,000 RGB .png images from each county (and based on other features such as Rural/Urban land classification, distance to coast, current Local Authority boundary) is manually labelled (polygons are drawn around areas of each tile that contain wood symbols as defined in the key and labelled “Woodland”) using QGIS software. An instance segmentation model (using mask-rcnn ResNet architecture) is trained on an 80% sample of this labelled dataset, with the remaining 20% used to validate the model performance using the performance metric Mean Average Precision. The model is trained using a GPU in an Azure Machine Learning Workspace and deployed to a private endpoint to facilitate the classification of the full tile set.

The model performance is further assessed by calculating the Intersection over Union score for the predicted polygons and “ground truth” polygons on each tile. A “correct” prediction is defined where a predicted polygon and “ground truth” polygon’s Intersection over Union score is greater 0.5 (True positive). Using this method, the precision, recall and F1 scores are calculated to assess and quantify the model quality.

The remaining geotif files are split into 200mx200m RGB png files. These data are passed into the model that predicts the relative location of “Woodland” areas for each tile. The relative positions are converted back into geospatial coordinates and saved as geometry objects in a shapefile; each county geotif has a corresponding shapefile which contains a list of polygons the model has predicted that contain different types of “Woodland”.

2.2 - Benefits

The risks associated with the decline in quality and size of woodland include a decrease in the biodiversity of wildlife and native organisms. Ancient Woodland can help address global warming as “long established” areas of wood are estimated to hold more carbon than all other types of tree cover.

Current data sources that map the location of Ancient Woodland are limited as they only identify areas of Woodland that still exist on modern day maps. This misses locations which are no longer identified as “Woodland”, but could be used in re-plantation, as there is space to plant new trees. Replanting trees in areas where woods previously existed is beneficial because:

  • the existing soil is already well suited to tree growth, containing neutrients and microorganisms
  • the original ecosystem is maintained which supports the growth of native tree species
  • previous root networks improves the soil structure and can facilitate water absorption

This tool can be used to fill in the gaps where Woodland was depicted on OS Historic Landmark maps but no longer exists in modern maps. This will enable researchers and conservationists to measure the change in Woodland coverage over time in different areas to target measures to prevent woodland loss in the future, as well as supporting re-planatation schemes and maximise the chance of new areas of trees becoming “long established” woodland .

To manually create this dataset would be time consuming and susceptible to human error and inconsistency. This tool is a faster, more robust method to map Ancient Woodland and can be used as a proof of concept to use deep learning for other land classification types using Historic Landmark data.

2.3 - Previous process

Previous efforts to map land use in Welsh public sector has been limited to manual classification. This has been time consuming and requires expert domain knowledge and quality assurance to ensure the labels, and how the areas are drawn, are consistent.

Manual labelling does not support iterative updates or versioning, therefore the data is limited to certain uses cases and definitions.

Instead, this tool uses an alternative machine learning approach which is faster and can be fine-tuned for specific use cases and so using this approach, over manual labelling, is preferred.

2.4 - Alternatives considered

The type of image classification we could have used included:

  • Object detection
  • Semantic Segmentation
  • Image classification
  • Instance Segmentation

Previous work outside Wales had looked into using object detection to identify individual stamps in maps and calculating the coverage of woodland by counting the number of predicted objects, however it was found this work was not accurate at identifying areas of woodland, possibly because the coverage of symbols to represent woodland areas are different for different counties.

Similarly, due to the different types of land that can be contained in a single 200mx200m tile, image classification would not give an accurate area coverage and risk missing smaller areas of woodland.

Semantic segmentation would predict the area of the tile with “Woodland”, however, the approach would not be able to distinguish between different types of trees that make up a single wood which could impact how the model predicts areas of “Mixed Woodland” with different stamps to represent the wooded area.

Therefore, instance segmentation, which provides the area of the image and predicted label, was the preferred model for this task. This follows similar work commissioned by the Forestry Commission which allows us to compare the performance of the tool with other outputs in different contexts but using the same approach.

Tier 2 - Deployment Context

3.1 - Integration into broader operational process

The tool will not be implemented in a decision making process but the outputs may be used to support decision making by domain experts around replanting strategies in Wales. Any decisions will be taken by the domain experts, who may choose to consider evidence from this tools outputs.

3.2 - Human review

A validation data set, that includes a range of different images showing different landscapes, has been labelled and used to quantify the error of the model. The outputs are then reviewed by a group of internal testers before it is released publicly. Guidance in the supporting metadata highlights the uncertainty in the model predictions, discouraging the use of the dataset for statistical purposes. As an open dataset, any user can download the output polygons and review in standard geospatial software, and we welcome feedback from others that we incorporate to help improve future versions of the model.

3.3 - Frequency and scale of usage

The tool will not be run repeatedly or persist as a process. Once the model has been optimised, historic map tiles for Wales will be classified, polygon objects identified for those areas predicted to have contained ancient woodlands, and the data set made available to the customer and published.

3.4 - Required training

Users of the shapefile outputs will need to be comfortable using shapefiles with geospatial software - these users may need to have basic training in geospatial software such as ArcGIS or QGIS. Alternatively, the shape files are available to view as a layer on Data Map Wales, Welsh Governments geospatial portal.

3.5 - Appeals and review

N/A

Tier 2 - Tool Specification

4.1.1 - System architecture

The tool is currently run across 3 different environments:

  • A virtual machine where the original Historic Landmark maps geotif files are split into 200mx200m tiles in a python environment. The main packages used in this process is rasterio and geopandas. Metadata about each formatted tile is saved to a SQL table within the same platform.

  • The preparing of the training data and human validation stages are run in a separate virtual machine which contains the open source QGIS software for labelling and displaying the geotif files.

  • The model training and deployment is managed in an Azure Machine Learning Workspace Platform, which is hosted within a pre-production VM (different to the previous 2 virtual machines). Within this platform the following components are saved:

  • The training and validation data is saved to a private Azure blob storage

  • The trained model is registered as a private artifact.

  • Once trained, the model is registered and deployed to a private endpoint and used to predict on unseen data which is also saved to the private blob storage.

4.1.2 - System-level input

The input is a set of images in png format, derived from OS Historic County Landmark maps.

4.1.3 - System-level output

The output is a geospatial file containing polygon geometries depicting where on the OS Historic County Landmark map woodland existed, with a value associated with each polygon, between 0-1, representing the model’s confidence in woodland existing in that location.

4.1.4 - Maintenance

The training and model predictions are managed through Azure Experiments.

Any future iterations, where additional data has been used to train the model for example, will be saved as a new model. Existing models will be saved locally and can be re-registered and used in the Azure Machine Learning Workspace if needed for comparison.

This will ensure the process to deploy and use the model for predictions remains consistent when using the tool, regardless of which version of the model which is being used.

Similarly, original versions of the training and testing data can be retrieved using the metadata for each tile saved in SQL.

There are three forms of validation which feeds into the maintenance and development of the model:

  • Mean Average Precision - this metric is calculated during the training of the model within the Azure platform. For more information, please refer to 2.4.2 Model sheet.

  • Confusion Matrix - these visualisations show the intersection between predicted woodland objects and manually labelled objects across the validation set. It indicates the proportion of manually labelled objects that correctly intersect with a predicted object with the same label, and if not, what other types of woodland are predicted and intersect with that object instead.

  • Manual Inspection - this is done in QGIS by geographers with expert domain knowledge in the area, to critical inspect the underlying data and predicted woodland, to help flag repeated patterns of mispredictions or suggested improvements to ground truth labels to improve the training data

However, as this tool is intended for single use, once deployed a regular maintenance schedule will not be required.

4.1.5 - Models

The following models are used in the tool:

  • Rules based filter, to remove images which do not have any objects to segment in the image. These include “background” images (e.g. images around the permiter of the geotif which do not contain any land, and images of large empty spaces on a map (e.g. fields) with no detail or symbols.

These images are removed by identifying the main pixel in each png. If the coverage of “dominant” pixel is > 99, the image is removed, because it suggests it is just a white or black background with no distinguishable features to predict.

  • Masked R-CNN image segmentation model - this is a convolutional neural network which is trained on series of feature maps of labelled images. The model is then used to predict a class label, bounding box and coordinates of the object (with confidence estimate) on unlabelled images. For more info please see 2.4.2 Model sheet.

  • Douglas-Peucker algorithm to simplify the returned predictions to smooth the final polygon objects and reduce the size of the output file for usability

Tier 2 - Model Specification

4.2.1. - Model name

Mask R-CNN Resnet

4.2.2 - Model version

markrcnn_resnet50_fpn

4.2.3 - Model task

To return the following predictions:

  • A class label
  • A bounding box offset - represents as 4 pairs of coordinates between 0-1 which contains the polygon object prediction
  • The polygon predicted object - several pairs of coordinates between 0-1
  • A confidence score of the class label assigned being correct

4.2.4 - Model input

A RGB png image which has dimensions of 940 by 940 pixels

4.2.5 - Model output

Json file with each input image defined by the filename saved in the input json:

  • A list of bounding boxes with coordinates assigned to “topX, topY, bottomX, bottomY” labels
  • A list of polygons saved as a list of coordinates between 0-1
  • A confidence score for each polygon, saved as a value between 0-1

4.2.6 - Model architecture

Mask R-CNN Resnet model is an improved version of regional convolutional neural networks that allows multiple objects (labels) to be predicted in a single image.

A feature map is extracted from a region of interest (partition) and a prediction is made on the likelihood of an object being contained in that partition. Each region of interest is evaluated independently.

The exact coordinate of the predicted object is returned along with the region of interest coordinates (bounding box) and confidence score.

The model was trained over 20 epochs with all other parameters set to default, which are listed in the supporting docs https://pytorch.org/vision/stable/models/generated/torchvision.models.detection.maskrcnn_resnet50_fpn.html

4.2.7 - Model performance

Mean Average Precision was used as the primary model metric to assess performance of the model:

The primary metric used for validate the model during training combines two metrics: Precision ñ the percentage of correct positive predictions out of all predictions Recall ñ the percentage of correct positive predictions out of all positive labelled data

The Intersection over Union (IoU) score represents how much two objects (a predicted area and ground truth area) overlaps We can decide how much the objects need to overlap (an IoU threshold value) for a prediction to be labelled as “correct” Therefore, we can calculate the precision and recall values over different thresholds and plot as a graph

The average precision is the area under a precision recall curve

The mean average precision is all the mean of the average precision across all labels

The mean average precision of the best model was 0.5. The best perfomring woodland category was Conifer, with an average precision of 0.73. The worst performing woodland category was Orchard, with an average precision of 0.32.

4.2.8 - Datasets and their purposes

Please see 2.4.3 Data for full explanation of data. The initial 1,096 training tiles are created from different historic counties using the Epoch 1 OS Historic Landmark maps. They are split (by 80:20) into a training and validation set. After these images are used to train the model, the images to cover the rest of Wales (482,907) are created and sent to the model to predict on.

Tier 2 - Development Data Specification

4.3.1 - Development data description

OS Historic Landmark series 1:2500 inch Epoch 1 - 1843 to 1893. https://www.landmark.co.uk/products/historical-maps/

4.3.2 - Data modality

Geospatial data

4.3.3 - Data quantities

Each OS historic county is saved as a separate geotif file. This contains geospatial information relating to the location of each map (using EPSG:27700 coordinate reference system) and map details with symbols explained in the supporting documentation + key.

4.3.4 - Sensitive attributes

N/A

4.3.5 - Data completeness and representativeness

All 13 Welsh county files (including counties on the modern border between England and Wales) are available.

Data is not available in upland areas which were not surveyed at this time, for example some areas of Bannau Brycheiniog are missing.

4.3.6 - Data cleaning

The input data is a collection of map tiles. Images are not cleaned, but have some filters applied to minimise the number of tiles passed forward for prediction (therefore minimsing costs).

4.3.7 - Data collection

Data is available for public sector analysis via the National Library of Scotland and was already accessed by Welsh Government

4.3.8 - Data access and storage

Data is saved in a restricted shared drive area, with access limited to specific users and managed by internal IT services.

Metadata for each individual 1km by 1km tile (used for labelling) and 200m x 200m tile (used for training + predicting) is saved to an internal SQL database with access limited and managed by internal IT services. When specified files are needed, the tile is extracted from the main geotif file and saved locally or to another restriced shared drive and deleted once use has finished.

Tiles used for training and testing are saved in the Azure blob storage within the private Machine Learning Workspace. Old versions of data are archived and deleted from this area when they are no longer needed.

4.3.9 - Data sharing agreements

The data was accessed and is used under the through the OS Public Sector Licence

Tier 2 - Operational Data Specification

4.4.1 - Data sources

All data used for training and predictions come from OS Historic Landmark Maps (Epoch 1)

4.4.2 - Sensitive attributes

No sensitive attributes

4.4.3 - Data processing methods

The original maps are large geotif files. These files are converted into smaller png images (200mx200m), which is a format the model can read. Any images which contain more than 99% white pixels are removed from the training and predicting datasets, as this suggests they contain barren landscapes, county boundaries or coastal areas without any features the model can predict on.

4.4.4 - Data access and storage

The model is trained with an Azure Job pipeline, using the MLFlow platform. The logs from these training runs are saved to the same private Azure storage account within the workspace and can be deleted at any time. Logs from jobs that “fail” for testing purposes are deleted. A copy of the logs are exported and saved to an on network shared drive with limited access managed by IT as a backup.

4.4.5 - Data sharing agreements

The original OS Landmark Historic County maps are used in accordance with a Public Sector Geospatial Agreement. A Presumption to publish criteria and notification form was completed upon publication to confirm the outputs do not constitute a Competing or Commercial Activity.

Tier 2 - Risks, Mitigations and Impact Assessments

5.1 - Impact assessments

A ethics assessment on this project has been completed using the UK Governments Data Ethics Framework. The main ethical considerations raised through this process included the risk of poor model predictions due to imbalanced or poorly labelled groud truth labels. We minimised this by ensuring the data to be labelled contained a representative sample of different areas or terrains (e.g. rural and urban landscapes) by using stratified sampling when creating our training set.

5.2 - Risks and mitigations

The following risks have been identified:

  • Poor quality input data due to issues with scanning and rendering of the original image, this was mitigated by calculating the pixel coverage of black and white pixels for each tile and flagging counties or groups of tiles which had lower than expected black pixel coverage

  • Ill-fitting model (by over or under fitting) due to poor coverage of the initial training data, this was mitigated by using stratified sampling when selecting the training data tiles to ensure the distribution of features such as LA, rural urban and distance to coast was the same in the training sample and the whole population

  • Model predicting single trees as areas of woodland, this was mitigated by implementing guidance when manually labelling the training set so a consistent number of trees were considered “woodland” and individual trees were not labelled as such

Updates to this page

Published 4 July 2025