Capturing engagement numbers - strand 2 summary report

Question 1

Executive Summary

Accepted Answer

With more accurate estimates of attendance at cultural or sporting events and locations, DCMS will be best positioned to conduct effective event planning, as well as robust evaluation and value for money assessments of hosting or facilitating these events.

While ticketing data can give a good understanding of engagement with ticketed events, measuring engagement at un-ticketed events is challenging. To date, insight has come from surveys and although the data provided is of good quality, provides valuable demographic insight and a replicable approach, they come with some limitations. For example, they are limited in their ability to comprehensively measure local participation at specific events or spaces. Given these potential limitations, and that even ticket sales or traditional crowd counting methods may not be an accurate reflection of attendance, DCMS wants to explore new data-driven methods for capturing engagement, using novel techniques.

These methods have been developed, experimented with and evaluated in real-world examples during this project. A selection of case studies covering a range of events and locations with a diverse range of characteristics has been produced to demonstrate the relative strengths, weaknesses and appropriate use cases of the methods developed in this study. While a single holistic methodology uniting several novel approaches was not found possible, significant positive results were achieved by applying a combination of general and tailored methods to a range of event types. Using these research findings, and an accompanying knowledge transfer toolkit, DCMS can begin integrating these approaches into bespoke tailored solutions for capturing engagement numbers as part of sporting and cultural event planning and evaluation.

A key outcome of this research was to help analysts to evaluate the suitability of data sources for predicting attendance at events with varying characteristics. The table below summarises the strengths and weaknesses of the four key data categories explored in this research against two main groups of metrics:

1. Analytical scope: Metrics measuring the extent to which the data type is suitable for predicting attendance at events of different event sizes, durations, types and locations.

	Social media data	Mobile app data	Activity data	Aerial data
Event size	Suitable for larger events only – minimum c.1000 attendees	Suitable for events of all sizes	Suitable for events of all sizes	Suitable for events of all sizes
Event duration	Suitable for events of all durations	Suitable only for long running events over multiple days	Suitable only for short events running over several hours	Suitable only for short events running over several hours
Event type	Suitable for sporting and cultural events	Suitable for sporting and cultural events	Suitable for sporting events only	Suitable for sporting and cultural events
Event location	Suitable for rural and urban events at indoor and outdoor locations	Suitable for rural and urban events at indoor and outdoor locations	Suitable for rural and urban events at indoor and outdoor locations	Background features of some rural, urban, indoor and outdoor events limits suitability

2. Operational constraints: Metrics measuring the budgetary, ethical and accessibility considerations associated with using that data source to predict attendance

	Social media data	Mobile app data	Activity data	Aerial data
Budget	Data subscription in low thousands per location	Data subscription in low thousands per location	Data subscription freely available for research purposes	Data accessible freely online or in agreement with event
Ethics	Manageable risk of personally identifiable information	Manageable risk of personally identifiable information	Anonymised data reduces risk of personally identifiable information	Manageable risk of personally identifiable information
Access	Access requires interaction with platform APIs or working through a managed service like Pulsar	Access requires participants to have network-connected mobile devices	Access requires working through the Strava Metro platform. Data download times are slow	High-vantage aerial footage of crowds that lasts full event duration is hard to access

Question 2

Introduction

Accepted Answer

DCMS needs an accurate measurement of participation to improve understanding of engagement and support audience development across its sectors. To date, measurement has mostly been provided by audience figures at ticketed events, as well as surveys, principally the DCMS Participation Survey and Sport England’s Active Lives Surveys. Each of these methods bring benefits but they also have limitations.

While ticketing data can give a good understanding of engagement with ticketed events, measuring engagement at un-ticketed events is more challenging. In the case of surveys, although the data provided is of good quality, replicable and can provide demographic insight, they can contain recall bias and struggle to measure local participation at specific events or spaces. To work to overcome limitations such as these, DCMS want to explore novel data driven methods to measurement.

This project has employed academic research and data analysis expertise to build on new and existing evidence to develop these methods. A comprehensive review of available data sources was undertaken followed by a process of evaluation and prioritisation for further research. This was ‘Strand 1’ of the project. Methods were developed by applying different modelling techniques to the prioritised data sources. Finally real events, covering different event characteristics, were identified as appropriate case studies to evaluate the suitability and accuracy of each method to estimate attendance. This completed ‘Strand 2’.

While a single holistic methodology uniting several novel approaches is not possible, significant positive results were achieved by applying a combination of general and tailored methods to a range of event types. Using these research findings, and an accompanying knowledge transfer toolkit, DCMS can begin integrating these approaches into bespoke tailored solutions for capturing engagement numbers as part of sporting and cultural event planning and evaluation.

Question 3

1. Methods

Accepted Answer

Four methods were developed to test against the target events

In the first phase of the programme, a comprehensive review of potential data sources was undertaken via an evidence synthesis and interviews with subject matter experts. Over 60 potential data sources were identified and sorted into four categories, with a prioritisation matrix applied to highlight promising data sources for experimentation, to test its robustness at predicting attendance at the 5 target events. The boxes below describe the priority data source used for modelling within each category.

Activity Data (Strava)

Popular activity and mobility tracking apps like Strava can be used to gauge attendance at social events, especially with a sporting focus.

Strava collects data through GPS tracking, recorded via users’ smartphones or wearable devices while they participate in activities like running, cycling, or walking. This GPS data includes information like location, speed, distance, and elevation and is aggregated anonymously and processed into broader trends, presented on a dashboard service called Strava Metro.

Activity recorded on the Strava app that is publicly shared can be used in conjunction with linear and ensemble models and scaling factor to capture engagement at sporting events.

Mobile App Data (Huq)

Many applications collect real-time location data on mobile phones or GPS-enabled devices. These data points are routinely anonymised and sold to third parties(detailed in the T&C’s when you sign up to the app) which collate and aggregate this information to provide population level estimates of people’s locations.

Data providers (such as Huq) create dashboards, based on a raw feed of real-time data. These data dashboards allow users to visualise data at different points of interest, defined by user-generated boundaries.

Strand 1 testing showed mobile data can be used in conjunction with site-specific features and a linear regression model to capture engagement at a range of location and event types.

User-generated content posted on social media can contain information about people’s locations, such as status posts, photos and any public information in their profile about location. This can be used in conjunction with an LLM and scaling model to capture engagement numbers at a range of events and locations.

Many social media platforms offer paid access to user posts through APIs. However, these vary in complexity and functionality and are often prohibitively expensive to use. However social media platforms have commercial agreements with social media monitoring companies, such as Pulsar, who provide a service allowing users to search for and collect posts fully in compliance with the platform’s terms and conditions.

Aerial Photography Data

Running Machine Learning (ML) models on high-vantage aerial footage of crowds, typically captured by CCTV cameras or drones, can be used to estimate crowd sizes.

Object detection and crowd density estimation models can analyse this footage to identify and count individuals within the captured area. These models are trained on large datasets of annotated images, enabling them to perform well even in dense and complex crowd scenarios. They do not use facial recognition.

Question 4

2. Summary of Target Events

Accepted Answer

The final outputs of this research and development project included a set of case studies. Each case study refers to an event, selected to represent a particular set of diverse demographic, geographic, and logistical characteristics. This was to ensure a robust evaluation of the methods applied to them, identifying the strengths and limitations of each approach. The targeted events included:

Giant’s Causeway and the British Museum which provides daily attendance data at cultural landmarks, with contrasting locations and visitor patterns.

The Great North Run which offers insights into a large, one-day sporting event with high participation levels.

Public screenings of the Women’s Euros football final which offers the opportunity to analyse irregular, crowd-dense gatherings.

Bradford City of Culture which spans multiple activities over time, representing a city-wide cultural initiative.

Question 5

3. Application of Method by Target Event

Accepted Answer

Methods developed during Strand 1 using the different data categories were assessed for their potential for predicting attendance at the targeted events. Data was not available for all events for each data source, whilst some approaches were not applicable for that type of event (e.g. aerial photography for indoor events). Detailed write-ups of each of the methodologies developed are outlined in the 5 case study packs that accompany this presentation. The table below provides an overview of which modelling approaches were tested against each event.

	Great North Run	City of Culture	Giant’s Causeway	British Museum	Women’s Euros
Mobile App Data	✓	✓	✓	✓	✓
Social Media Data	✓	✓	✓	✓	✓
Aerial Photo Data	✕	✓	✓	✕	✕
Activity Data	✓	✕	✕	✕	✕

Question 6

4. Evaluation of Methods

Accepted Answer

Each method was evaluated against the following 8 criteria with scores from 1 - 5. Technical teams independently evaluated the results of each data source, before comparing assessments and through consensus agreeing on the final assessment scores, ensuring a robust assessment process.

Accuracy: Measure of how close the predictions are to the actual known baseline attendance numbers, reflecting the correctness of the model’s predictions.

Bias: Evaluation of whether the model’s predictions systematically over-represents or under-represents certain groups or outcomes, indicating potential unfairness or imbalance in results.

Ethics: Assessment of the extent to which research ethics are adhered to by using this model, including respect for privacy, informed consent, and avoidance of harm.

Deliverability: Consideration of how feasible it is to deploy and maintain the model in real-world scenarios, considering factors like data access, technical complexity, scalability and resource requirements.

Cost: Examination of the financial investment required to build, deploy, and maintain the model, including hardware, software, and human resources.

Demographics: Evaluation of the model’s ability to incorporate data on different demographic groups, ensuring that predictions reflect a diverse audience and demographic information is captured for analysis where appropriate.

Generalisability: Measure of the ability of a model developed to predict attendance at one specific event to predict attendance at other events.

Accessibility: Consideration of the challenges associated with securing access to data in a format that is usable and compliant with relevant platform level and wider governance considerations.

Evaluation of Activity Data Methodology

Accessed via the Strava Metro Dashboard

Figure 4.1: Activity Data Radar Plot. Each criteria is indicatively visualised with a score out of 5, where 5 describes ‘Strong performance with no issues’ and 0 describes ‘Poor performance with insurmountable challenges

Accuracy - 3 : Two models were developed with activity data to predict attendance at our key target sporting event, the Great North Run. The linear model was most accurate for this specific event (8% error range). However, the approach generalised poorly to other running events. The stronger model overall was the Ensemble model, which accurately predicted a wider range of races, even though its prediction for the Great North Run was worse – 26% off the mark. Training data from more large race events would improve performance, with these results also showing that different types of models should be used for different types of events.

Bias - 2 : Research shows that activity data can skew towards younger and more active demographics. Consequently, this model will be biased towards these groups and should be expected to be most performant in predicting events attended heavily by such groups. This bias in the data means the model should be expect to produce less accurate predictions for target race events with older demographics or other characteristics that fall outside Strava’s typical user base, although future work could identify approaches to control for these biases.

Ethics - 5 : Strava data is heavily anonymised and aggregated making it extremely difficult to identify any individuals in the data. Specifically, individual athlete data is not returned at all – all counts are aggregated to the nearest 5 athletes and race times are aggregated to the hour. All data available in the dashboard is publicly shared by users who log activity on the Strava app.

Deliverability - 4 : This method is highly replicable and will, by definition, improve with each additional race event added to the training data. In this way, the approach should improve over time as it is delivered across more race events. In this approach all modelling and data processing can also be run locally without the need for large GPUs. However, the process for downloading data from the Strava Metro dashboard is highly manual and can be in-efficient, as explained in detail in the methodology toolkit.

Cost - 5 : Subscriptions to Strava Metro are made freely available for research organisations, urban planners and government authorities. HMG already posses has a subscription to Strava Metro for all UK sites, meaning costs should not increase with the downloading of more sites. Additional data sources, such as weather, were accessed for less than £100 total.

Demographics - 2 : Because Strava Metro data is deliberately aggregated into groups of 5 runners, it is not possible to identify the demographic of participating athletes. For this reason, Strava data cannot in isolation produce insight into the demographics of attendees at races, only their overall numbers.

Generalisability - 1 : Strava data produced strong performance predicting attendance at a range of running events around the UK, including our target sporting event: the Great North Run. However, given the nature of its input training data, it does not generalise to predict attendance at non-sporting events, including each of the four other target events in scope of this research.

Accessibility - 4 : Strava data is made available to users through a managed dashboard service, from which specific datasets can be download via simple API calls. The only complicating factor is the requirement to define the boundaries of the event before downloading data, which requires consideration in advance. The subsequent download times can be long (1-2 days) thereafter, but the process itself is very simple.

Evaluation of Mobile App Data Methodology

Accessed via Huq

Figure 4.2: Mobile App Data Radar Plot. Each criteria is indicatively visualised with a score out of 5, where 5 describes ‘Strong performance with no issues’ and 0 describes ‘Poor performance with insurmountable challenges

Accuracy - 4 : Mobile app data produced the most accurate prediction at sites where events take place continuously. Accuracy dropped for short-lived events (e.g. several hours long) due to limited training data and because of challenges in applying a method to weight and scale this data towards as visitation estimate for time periods less than one day. Sample weighting predictions produced more accurate results for short-lived events like the Great North Run and Women’s Euros Screening. By contrast, linear modelling produced more accurate results at long-running locations like the Giant’s Causeway and British Museum where predictions were made for annual figures.

Bias - 4 : A key strength of mobile app data is the ability to uncover bias within the sample. Additionally, users can be assigned socio-demographic characteristics based on their estimated home location within this methodology, allowing for deeper analysis of representativeness and potential biases in the sample. Strand 1 of this project contains full details on how this is possible, and work done by Sinclair et al to test to date. Mobile data provides good population coverage at broad spatial and temporal scales, but smaller samples reduce user representation, therefore increasing the risk of bias.

Ethics - 2 : User consent exists with mobile app data, because this data type can only be collected with user agreement, as agreed within the mobile phone apps from which location data is sourced. However, the classification of this type of data collection as ‘active consent’ debatable, given that mobile users often accept app terms and conditions without fully reading them. Often collection of location data is buried within terms and conditions. While all data is de-identified by Huq, estimating users’ home areas also introduces ethical considerations though it helps reduce biases in outputs, which is an ethical consideration in itself.

Deliverability - 4 : There are technical challenges to working with this data, which depend on whether raw or processed data is used, and the volume being analysed (e.g., a single site vs. an entire city). Larger datasets require more computational resources and expertise to process effectively. While the data is generally feasible to deploy, some challenges exist, including resource demands and the need for big data and geospatial technical expertise. Overall, this methodology is replicable and relatively straightforward to deliver, but does require technical skills and resources, especially with increased scale.

Cost - 4 : Costs associated with using location data via Huq were low for this project in terms of computational resources. However, procuring mobile app data can be expensive depending on the project, provider and use case, especially with increased scale. However, given the data’s potential value relative to its cost, it can be considered cost-effective, especially in comparison to other data sources.

Demographics - 4 : While mobile app data may contain biases, a key strength of the methodology developed is its ability to uncover and analyse sample coverage at small geographic scales, by making comparisons to known population distributions or user samples for specific sites or events (assuming that data is available). This capability helps identify and potentially reduce bias in ways that would otherwise be challenging. However, this does generally depend upon the demographics of the available data being similar to the target event being tested.

Generalisability - 4 : Location data from Huq (among others) is available UK-wide, and many methods are transferable to different types of spaces, making the approach highly generalisable. The sample weighting method can be applied to any space making it very generalisable, and while the machine learning linear models focus on tourist spaces with annual data, they could be further developed for other environments and with baseline data at smaller temporal scales.

Accessibility - 4 : Access to mobile phone app location data requires collaborating with a third-party provider, such as Huq. However, once this relationship is established and the relevant subscription is in place, mobile app data is generally straightforward with vendors usually providing a managed service that is easy to use and extract data from, meaning this data source and methodology scores well for accessibility.

Accessed via the Pulsar Platform

Figure 4.3: Social Media Data Radar Plot. Each criteria is indicatively visualised with a score out of 5, where 5 describes ‘Strong performance with no issues’ and 0 describes ‘Poor performance with insurmountable challenges

Accuracy - 3 : Modelling approaches with social media data show promise but, given the diversity and noisiness of social media data, models trained on this data will benefit from much larger training data set than available in this research – where social media posts for 24 events were collected, numbering hundreds of thousands of posts. A training data set in the region of x10 would be needed for optimal performance. The model performs best at medium-large sized events, where the following are true: (i) A considerable number of attendees share their attendance online; (ii) There is no large online/TV audience posting about the event who aren’t in attendance, skewing results.

Bias - 3 : Social media data presents risk of bias at all stages in this approach. In particular, the nature of the event will affect who and how many people post publicly on social media, with some events having limited social media presence. The variable quality of how relevant data returned from Pulsar queries also affects results. The performance of the LLM used to classify whether a post indicates in-person attendance or not is sensitive to the description and examples it is fed. These challenges require careful management to reduce bias.

Ethics - 3 : All social media data used for this methodology is publicly available data, as the Pulsar platform cannot retrieve any posts from private groups or chats. However, user information will still be returned in searches (i.e. profile names and pictures) and these should be anonymised. This approach does carry some Evolution risk of negative public perceptions associated with government monitoring of public social media data, especially where necessary steps regarding anonymisation of Personally Identifiable Information are not taken.

Deliverability - 3 : Delivering this methodology to its full potential will require retraining the scaling model portion of the approach on significantly more data, which may allow for development of a unified model, applicable to a much wider range of events. However, there remains a risk of failure due to the noisy and inconsistent nature of social media and how platform policies are changing with increased privacy walls, which means this approach may not be extendable into the future. Full details of these changes are set out in the Strand 1 report for this project.

Cost - 5 : Costs for social media aggregator subscriptions can be low (subject to the desired or required level of subscription), with costs in the low thousands (slightly more than c.£1k per month) for this project. However, this methodology does require an expensive GPU server in order to run the LLM used to categorise posts. There are computational resource costs associated with this approach and larger and more powerful LLMs needed to improve performance will be more costly.

Demographics - 2 : Very limited demographic information can be deciphered from social media posts. Only user-submitted gender categories appear in the data at all. In some cases (again depending on user input and permissions) the location from which has post was uploaded is also present. This is a potential source for further exploration, but its incompleteness and uneven distribution meant it was not explored beyond Strand 1 testing where this was explored.

Generalisability - 3 : The current model developed using social media data, given its current training, shows only weak evidence of generalisation to previously unseen events. This modelling approach will only allow generalisation to events of the same type as those seen in training data. A significantly larger training datasets, including social media posts from a far wider range of events, will be required to improve on this.

Accessibility - 2 : Access to social media data is restricted by social media platforms, with full detail set out in Strand 1 exploration. While it is possible to navigate platform-level terms of service on your own, collaborating with third party providers can generally make data much more easily accessible and affordable. Once this relationship is established and the relevant subscription is in place, vendors often provide a managed service that is easy to use and extract data from. However, platform-level terms and conditions still apply, including limiting access to live data from some providers and data held private groups being inaccessible.

Evaluation of Aerial Photography Methodology

Accessed via the Drone Footage

Figure 4.4: Aerial Photography Radar Plot. Each criteria is indicatively visualised with a score out of 5, where 5 describes ‘Strong performance with no issues’ and 0 describes ‘Poor performance with insurmountable challenges

Accuracy - 3 : Object detection and crowd density models will accurately count crowds from video footage, where it is available at a suitable quality and with sufficient coverage of the overall event. However, the following sources of information are also need for accurate estimates: (i) A figure for crowd churn rate between frames to eliminate double-counting of attendees , (ii) Background landscapes that will not interfere with count. Accuracy is also compromised where parts of the frame are heavily shaded.

Bias - 4 : The YOLO object detection model used in this research is not biased against any protected characteristics and so performs well on bias measures. However, in some specific applications (e.g. to the Giant’s Causeway footage collected separately in this project) it struggled to differentiate between humans and the hexagonal rocks, leading to false positives and skewing results.

Ethics - 3 : As set out in Strand 1, using video footage of humans always carries an innate risk of personal identification of faces, though this is unlikely given no facial recognition software was used in this project. Further, while it is possible to collect video footage in compliance with legislation, this data type is not typically collected with the informed consent of those in the footage.

Deliverability - 3 : The approach developed in this project is highly replicable and relatively low intensity from a resource perspective. The object detection and crowd density models used are open-source, applicable to a wide range of datasets without retraining and configurable to trade-offs between speed and accuracy. However, considerable data science expertise is needed to implement the approach, which will make it less deliverable for some organisations.

Cost - 4 : The majority of costs associated with Evolution using video footage of events to predict attendance are likely to result directly from the procurement of data itself (potentially including the cost of buying a drone to take the footage). Acquiring data in cases where video footage needs to be purchased from another provider may be costly. However, compute costs themselves are minimal once the data can be paid for or in cases where it is freely available.

Demographics - 1 : Object detection and crowd density models do not in themselves reveal any additional information about the demographic make up of the crowd.

Generalisability - 3 : Due to a lack of quality aerial footage, this approach has not been applied on many other events as part of this research project. The model developed performed reasonably effectively at predicting attendance at the Les Giraffes procession in Bradford, but it generalised poorly to the Giant’s Causeway. This is because: (i) There was no time period for the footage, making churn estimates impossible and (ii) The model failed to differentiate people from some rock formations, leading to false positives.

Accessibility - 1 : Accessibility is one of the main challenges associated with this approach. Where quality video footage can be provided, object detection and crowd density models can be very effective at crowd counting (where background landscapes do not bias results). However, this data type is not routinely collected and can be difficult to get access to. In many cases those organisations who collect high quality aerial footage of events such as policing and security organisations will be unlikely share this data for data governance reasons.

Cross-Cutting Methodology Limitations

Identifying limitations common to all models developed in this research is challenging. This is because it can be difficult to distinguish between limitations of the model from limitations of the data, particularly given the small datasets these models are trained on. However, there are some machine learning limitations common to each methodology that analysts should be aware of:

Overfitting: The models developed in this project have been trained on a relatively small volume of training data, such that they can measure attendance at the specific target events identified in our research. This means they may be ‘overly-specialised’ to the training data they have been shown. For this reason, analysts should expect that every model will benefit from additional training data if they want to apply the models to events other than those targeted in this initial research.
Comprehension Challenge: Machine learning models do not always provide explanations of their outputs, which can make interpreting and improving findings challenging. This is sometimes referred to as the ‘black box’ problem, and it can get worse the larger and more complex machine learning models get. During this project, for example, the activity data methodology highlighted some of the best features in predicting attendance as the age demographic of people near the location, whereas weather had little impact. This result is not intuitive to human analysts and yet was instructive in predicting attendance.
Baseline Data Challenge: Machine learning models are only as good as the baseline data they are trained on. All methodologies in this project struggled with sourcing a sufficiently large and high-quality baseline dataset on which to train machine learning models. Estimates generated using mobile app location data, for example, would be best suited to high-quality daily visitation estimates, of which very few are publicly available.

Question 7

5. Comparison of Methods

Accepted Answer

5.1 Percentage Error Comparison

Table. 5.1: Percentage error comparison for each methodology. Methodology developed means mobile app figures are for 24-hour period, impacting ‘accuracy’ for short-lived events

Case Study	Mobile App Data	Social Media Data	Aerial Photography	Activity Data
Bradford City of Culture	85%	56.1%	23.7%	Not Used
Giant’s Causeway	9.5%*	235%	Not Viable	Not Used
Great North Run	158%	16%	Not Used	26%
British Museum	6.4%*	6.8%	Not Used	Not Used
Women’s Euros Final	168%	799%	Not Used	Not Used

* Selected Mobile App methodology figures are median percentage error as opposed to percentage error owing to the need to account for the impact of the Covid-19 pandemic on location data.

Relative success can be conceptualised through comparing the likely future ‘return on investment’ through further development work:

High Future Return on Investment (VALUES TBC) : Promising methodologies with results that can be considered ‘excellent’ in the context of an R&D Programme. With further development benefits are likely to be high. Includes the use of mobile app data with long-term events, and social media data at higher-volume in-person attendance scenarios.
Moderate Future Return on Investment (VALUES TBC) : Results that can be considered ‘good’ in the context of an R&D Programme. These methodologies are worth pursuing, with investment in further training and fine-tuning likely to yield positive results. Includes activity data used for sporting events and aerial photography in a suitable context.
Unclear Future Return on Investment (VALUES TBC) : Methodologies have potential, but the value of continued investment and development unclear at this stage. Includes the use of mobile app data with short-term one-off events and social media with lower levels of online footprint.
Negative Future Return on Investment (VALUES TBC) : In some scenarios it is clear that the method is not viable and does not warrant future development. Includes application of aerial photography methodology in atypical locations/events and social media use with exceptionally high levels of ‘noise’ or virtual-only participation.

5.2 Strength and Weakness Comparison

Strand 2 experimentation highlights the relative strengths and weaknesses of each methodology in-context

Figure 5.2: Radar diagram comparing the assessment scores of each modelling approach

Activity Data (Strava):

Better For:

Leverages millions of runs that are publicly shared and aggregated to predict attendance at sporting events, with strong results for running events.
Low-cost approach set-up such that addition of new training data will improve generalisability over time to more sporting events, including cycling and walking events.
Data access is straightforward.

Weaker For:

Optimisation for sporting events means performance is non-existent for cultural events.
Significant time delays are experienced when downloading training data out of Strava Metro.
Over-representation of young and active populations in training data means model doesn’t generalise well to events where these groups are not represented.
No demographic insight about attendees.

Mobile App Data (Huq)

Better For:

Accurate for long-running recurring events that repeat over the course of weeks, months or years e.g. museums or tourist sites etc.
Method is low-cost and with the right technical expertise more straightforward to deliver.
Demographic insights can be derived by making comparisons to known population distributions for specific sites or events, helping to uncover demographic information and reduce bias.

Weaker For:

Mobile app data is generally less suitable for short lived events (e.g. a few hours in duration). It is better suited for recurring events and locations with fixed boundaries.
Mobile app data is subject to some ethical concerns around use. While this data is legal to collect and for providers like Huq to license, there are some concerns about whether individuals can be said have given ‘informed consent’ for their data to be collected.

Better For:

Aggregated platforms allow a low-cost and straightforward way of leveraging social media data to predict attendance. They reduce friction in collecting data and managing platform-level terms of service obligations across different social media platforms.
Most accurate results for events where a significant numbers of posts confirming in person attendance are made online.

Weaker For:

Accuracy reduced where events have few online posts or limited social media presence.
Limited demographic data available.
Noisiness of social media data means significant further training of the model will be needed to improve generalisability (e.g. posts about the events from people who didn’t attend in-person).
While compliant with platform-level terms of service, there is not clearly ‘informed consent’ for this use case from an ethical point of view.
Changing privacy laws and company regulations means the approach may not be viable in the future.

Aerial Photography Data (Drone Footage)

Better For

Accurate results where quality footage is available and approximate figures for churn between frames can be established to reduce double-counting.
Approach is highly replicable and low cost.
Innate risk of personal identification in images is unlikely given no facial recognition software was used.

Weaker For:

High-resolution, high-vantage aerial photography data is hard to access, normally requiring bespoke deployment of drones or cameras to events and collaboration with event organisers.
No additional information is provided about the demographic make-up of crowds.
Model struggles to detect people against complex, shadowy or unusual backgrounds.

5.3 Assessment Comparison

Figure 5.3: Bar chart comparing and contrasting the assessment of each method. Each criteria is colour coded with a score out of 5, where 5 describes ‘Strong performance with no issues’ and 0 describes ‘Poor performance with insurmountable challenges.’

Question 8

6. Data Source Selection Decision Tree

Accepted Answer

The following decision tree is designed to help analysts determine which data source is most suitable for different event types. It does not outline all possible considerations but can guide analysts about what aspects of an event makes each modelling approach suitable, or not, for estimating attendance at that event.

Figure 6.1: Data source selection decision tree

In 2026, Glasgow will again host the Commonwealth Games. The decision tree flows below provide an illustrative example of where methodologies developed in this project could be deployed by analysts to measure participation in events associated with the Games.

Figure 6.2: Glasgow 2026 - Data Source Selection Decision Tree

Question 9

7. Delivery Method Options

Accepted Answer

There are three main ways models developed to predict attendance could be delivered across government and with its partner organisers. Outlined below are considerations for selecting a delivery model for each AI model developed in this project.

7.1 Community Driven Open Source

Models are developed collaboratively by government and external partners and made publicly available for decentralised use and open-source innovation.

Mobile App Data: Location data is typically aggregated by commercial providers and is rarely available open source due to privacy and proprietary constraints. Open-source development is therefore difficult and could risk exposing sensitive data handling methods, so the community approach is less suitable unless strict safeguards are in place.
Social Media Data: Social media data is accessed in various ways online, making the methodology theoretically shareable. However, ethical risks are significant if privacy safeguards are not strictly enforced, and open-source releases may raise concerns over misuse. Any open-source community must implement robust anonymisation and data filtering measures to avoid harm.
Aerial Photo Data: Aerial imagery is often available through open sources or public datasets, making it a promising candidate for an open-source model training. However, there are ethical concerns about surveillance and misuse where induvial are identifiable. Community developers must embed strict ethical guidelines and data anonymisation techniques.
Activity Data: Strava data is proprietary and not typically available as open source, limiting the viability of an open-source model for this approach. While the methodology might be shared, the training data itself is restricted, and the community must be cautious of proprietary information exposure.

Advantages

Open-source models encourage use and iteration from a community of developers, researchers, and industry experts. This can drive innovation and keep models up-to-date.
External contribution to development reduces the financial burden on the public sector.
‘Building in the open’ promotes transparency in AI development and fosters public trust.

Limitations

The government has limited ability to regulate and prevent unethical use of the models.
Without structured oversight, open-source contributions may vary in quality, and security vulnerabilities could emerge.
Event organisers or government users may struggle with implementation unless a dedicated support structure is created.

7.2 Centralised government ownership

Dedicated government organisation develops, maintains, and manages the AI models and data, using them internally to predict event attendance.

Mobile App Data: Centralised ownership provides a framework for managing privacy risks associated with location data, ensuring compliance with regulations (e.g. GDPR), and applying standardised collection methods. This approach offers the best control over data quality and ethical use. Data will only have to be bought once, but infrastructure costs may be high.
Social Media Data: A central model allows for tight ethical oversight and control over social media data sourcing and privacy compliance. Government can negotiate data access agreements with platforms to ensure that usage aligns with terms of service and ethical standards, though it might limit flexibility and external innovation.
Aerial Photo Data: Central government ownership can ensure that aerial data is sourced ethically and processed under tight regulatory oversight. This model benefits from controlled training data and robust validation against independent baselines, although initial baseline counts may need further verification from organisers. This is a weakness compared to the service-enabled partnership model.
Activity Data: Central ownership can secure necessary licensing and manage a controlled dataset, ensuring the model is trained appropriately. However, the limited training data may require further investment in obtaining a more representative dataset to scale predictions for large events. In this delivery model, this additional work would be undertaken by government, meaning higher costs for greater control.

Advantages

Government controls model development, usage, and updates, ensuring compliance with ethical guidelines and data governance.
Government-run quality assurance means models remain well-calibrated and updated with reliable datasets.
Having a single responsible entity ensures accountability.

Limitations

Development and maintenance requires sustained public funding, slowing development compared to industry-led innovation.
A centrally controlled model may struggle to adapt to the diverse needs of event organisers across different regions and event types.

7.3 Service-enabled partnership

Government develops and owns models in collaboration with industry, then offers these models as a managed service to event organisers across its sectors.

Mobile App Data: A service-enabled partnership can work closely with location data providers and local event organisers to secure anonymised location data. By aligning incentives with organisers, the model can improve data accuracy and reduce overcounting, though it will require clear privacy protocols and data-sharing agreements.
Social Media Data: Collaboration with event organisers can encourage sharing of verified geolocations which could be used for reference against social media data and potentially improve accuracy. This approach supports a tailored service that can help mitigate ethical risks through contractual controls and targeted data filtering, but requires ongoing engagement to ensure data quality and representativeness.
Aerial Photo Data: Collaborating with event organisers can provide access to high-quality, event-specific aerial footage. This can prove hard to access for this data type. A service partnership could make it easier to integrate multiple camera angles and dynamic processing to adjust for issues like churn, while contractual arrangements help mitigate ethical risks related to surveillance and data misuse.
Activity Data: Partnering with event organisers and possibly Strava directly can facilitate data sharing under agreed terms. Partnerships with events may free-up event specific activity data for model training. This model benefits from real-world collaboration to refine scaling factors and improve accuracy, though it hinges on the willingness of stakeholders to share proprietary activity data.

Advantages

Working directly with event organisers and private-sector partners ties models development to practical, real-world applications.
Event organisers are more likely to share data (e.g., ticket sales, registration data) if they are directly benefiting from the model’s outputs.
Allows for training and other forms of support to help organisers effectively use models.

Limitations

Not all event organisers have the technical expertise to effectively implement AI models, which could lead to inconsistent adoption.
Since this model relies on partnerships rather than broad public access, widespread adoption may be slower than an open-source approach or government-driven programmes.

7.4 Recommendations

Summarised below are some recommendations for how government can progress the modelling approaches developed in this research.

Continuously monitor emerging data sources: Government should maintain a dynamic data sourcing strategy. Regularly review new platforms and sensor technologies, as well as updated APIs from existing providers. For example, advances in wearable technologies and the increasing ubiquity of IoT devices may soon offer highly granular, real-time data that could refine attendance predictions further.
Embrace advancements in AI and machine learning: The field of AI is rapidly evolving. Over the next year, developments in transformer models and domain-specific LLMs may significantly improve the accuracy of text-based classification tasks, such as distinguishing between actual attendees and online commentators. In this research project it is likely that a larger, more performant LLM would have yielded better results, though we ran out of time to test these hypotheses. Additionally, integrating cutting-edge computer vision techniques can enhance object detection and density estimation in aerial photography, reducing error margins and improving baseline validations.
Foster collaborative innovation ecosystems: Consider establishing public–private partnerships to co-develop these models with data providers. This collaborative approach could align well with a community driven open-source delivery model and facilitate access to proprietary data sources—such as Strava activity data and mobile app location information—and promote the adoption of best practices across industry and government. Open innovation challenges or hackathons could further stimulate creative solutions from academia and the private sector.
Implement rigorous ethical and privacy frameworks: As data sources expand and become more granular, ethical and privacy considerations will become even more critical. It is recommended to develop robust, government-wide ethical guidelines for event data collection, and invest in technologies that enable data anonymisation and secure data sharing. This will not only ensure compliance with regulations like GDPR but also maintain public trust in government data practices.
Pilot and validate model performance: Implement pilot runs to evaluate model performance before scaling, using a structured framework that assesses predictive accuracy, data reliability, scalability, ethical considerations, and ease of use. These pilots should compare how well models trained on past events generalise to new ones.

Capturing engagement numbers - strand 2 summary report

Executive Summary

Introduction

1. Methods

Activity Data (Strava)

Mobile App Data (Huq)

Aerial Photography Data

2. Summary of Target Events

3. Application of Method by Target Event

4. Evaluation of Methods

Evaluation of Activity Data Methodology

Evaluation of Mobile App Data Methodology

Evaluation of Aerial Photography Methodology

Cross-Cutting Methodology Limitations

5. Comparison of Methods

5.1 Percentage Error Comparison

5.2 Strength and Weakness Comparison

Activity Data (Strava):

Mobile App Data (Huq)

Aerial Photography Data (Drone Footage)

5.3 Assessment Comparison

6. Data Source Selection Decision Tree

7. Delivery Method Options

7.1 Community Driven Open Source

7.2 Centralised government ownership

7.3 Service-enabled partnership

7.4 Recommendations

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Executive Summary

Introduction

1. Methods

Activity Data (Strava)

Mobile App Data (Huq)

Social media Data (Pulsar)

Aerial Photography Data

2. Summary of Target Events

3. Application of Method by Target Event

4. Evaluation of Methods

Evaluation of Activity Data Methodology

Evaluation of Mobile App Data Methodology

Evaluation of Social Media Data Methodology

Evaluation of Aerial Photography Methodology

Cross-Cutting Methodology Limitations

5. Comparison of Methods

5.1 Percentage Error Comparison

5.2 Strength and Weakness Comparison

Activity Data (Strava):

Mobile App Data (Huq)

Social Media Data (Pulsar)

Aerial Photography Data (Drone Footage)

5.3 Assessment Comparison

6. Data Source Selection Decision Tree

7. Delivery Method Options

7.1 Community Driven Open Source

7.2 Centralised government ownership

7.3 Service-enabled partnership

7.4 Recommendations

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK