Capturing engagement numbers - strand 2 - Bradford 2025 UK City of Culture case study
Published 13 March 2026
This report was authored by Jack Medlock, Hannah M. P. Stock, Andrew Knight, Donna Phillips, Adam L. Ozer, and Joseph Stordy at Verian, Dr Michael Sinclair, Dr Craig Macdonald, and Prof Iadh Ounis at The University of Glasgow, and Faculty.
This research was supported by the R&D Science and Analysis Programme at the Department for Culture, Media & Sport (DCMS). It was developed and produced according to the research team’s hypotheses and methods between October 2023 and June 2025. Any primary research, subsequent findings or recommendations do not represent UK Government views or policy.
Executive Summary
While ticketing data gives a good understanding of engagement with ticketed events, measuring engagement at un-ticketed events is difficult and often relies on surveys. Although the data provided by these surveys is of good quality, replicable and can provide demographic insight they come with their own challenges and limitations. They are limited, for example, in their ability to comprehensively measure local participation at specific events or spaces. Given even ticket sales or traditional crowd counting methods may not be an accurate reflection of attendance numbers, DCMS want to explore new data-driven methods using novel techniques.
This case study explains and compares 3 such methods, each based around a specific data source, for predicting attendance at the Bradford City of Culture:
- Aerial data
- Mobile app data (Huq)
- Social media data (Pulsar)
Each methodology is summarised and compared against 8 categories: Accuracy, Bias, Ethics, Deliverability, Cost, Demographics, Generalisability and Accessibility.
Case studies have been developed for each event in scope of the research. The Bradford City of Culture was one of 5 events selected.
1. Event Overview
Location: Bradford, UK
Date: 24 August 2024
Summary: Procession of seven larger than life models giraffes through Bradford by a street theatre company.
The Les Giraffes procession during Bradford City of Culture was selected as a research candidate for the counting engagement project because it exhibited the following characteristics:[footnote 1]
Drone footage was taken of the entirety of the event and shared with the project team by the Bradford 2025 team. Moving crowds from a range of different angles and against a variety of different backgrounds posed an interesting challenge as to whether object detection and crowd density models could predict attendance. Extrapolation from counts of attendance in specific frames to overall event attendance was a novel problem to solve.
2. Methods
Three methodologies were developed to predict attendance at the Bradford City of Culture
Aerial Photography
Various sensing technologies can be deployed directly to event to capture data directly from the site. These include fixed cameras, such as CCTV, drones, wearable and radio frequency identification tags.
Utilising these data sources normally requires bespoke deployment to each event, including collaboration with the event organisers. This data is particularly useful for any large set-piece events, especially where the crowd is relatively or captured from a wide range of angles.
Video footage of the Les Giraffes event during the Bradford City of Culture celebration was taken from drones and this project experimented with a range of machine learning models to count crowds attending the location.
Mobile App Model
Many applications collect real-time location data on mobile phones or GPS-enabled devices. These data points are routinely anonymised and sold to third parties(detailed in the T&C’s when you sign up to the app) which collate and aggregate this information to provide population level estimates of people’s locations.
Data providers (such as Huq) create dashboards, based on a raw feed of real-time data. These data dashboards allow users to visualise data at different points of interest, defined by user-generated boundaries.
Aggregated underlying data from Huq was used to analyse the outputs for specific boundaries of the Giraffes procession during the Bradford City of Culture and predict event attendance.
Social Media Data
User-generated content posted on social media platforms can contain information about people’s locations. These digital traces – such as status posts and photos and any public information in their profile about location - can be used as a proxy for physical attendance.
Social media platforms have commercial agreements with social media monitoring companies, such as Pulsar, who provide a service allowing users to search for and collect posts fully in compliance with the platform’s terms and conditions.
Social media data from Pulsar was anonymised and analysed, then a classifier model was used to predict overall attendance using posts that can be determined as from true event attendees.
3. Aerial photography methodology
There are a three key steps (Covered in Sections 3.1 - 3.3) in the process of measuring site and event visitation using mobile phone data:
-
Downloading & Extracting Footage
-
Applying Modelling Approaches
-
Frame Aggregation & Feature Tracking
3.1 Downloading & Extracting Footage
Figure 1: 14 individual scenes extracted from the event
14 Individual scenes were extracted from the event. These were taken from 2-minutes of drone footage video taken and shared by the Bradford 2025 team. From each of these scenes, individual frames were extracted – roughly 4 to 5 frames per second from the original 50 frames per second. This process transformed the video into 14 groups of still images, making it more suitable for detailed analysis in conjunction with a machine learning model. By selectively reducing the frame rate, redundant frames were filtered out while preserving those that best captured movement, changes, and key moments within each scene.
3.2 Applying Modelling Approaches
Figure 2: Yolocrowd model output. An individual is counted in a green square.
Figure 3: Dm-count model output
After extracting the individual frames for analysis, two different crowd-counting models were applied to the resulting images. Yolocrowd (You Only Look Once) Model, is a small-scale object detection model that has been fine-tuned for detecting people in crowd. This is shown in Figure 2, where an individual is counted in a green square. Dm-count, a density-based model which measures the density of the crowd rather than individual points was also used, and is shown in Figure 3.
A visual inspection conducted by a human-assessor, who manually counted attendees to create a baseline figures, showed that the YOLO model generally performed better. Overall, both models were found to encounter similar challenges and struggled with areas of darkness, shadows or obscurities such as smoke or confetti.
3.3 Frame Aggregation & Feature Tracking
Figure 4: Polygons created around previously identified individuals.
Once ‘still counts’ were established for each of the individual frames, aggregate counts for each scene were required in order to arrive at an overall prediction for the event. To do this, a SIFT model (scale invariant feature transformation) was applied to the 14 scenes, which identified consistent features (i.e. human attendees) across these frames, even when the perspective in a frame changes.
Using these features, polygons were created around previously identified individuals, preventing double counting as the analysis progressed through the frames. This approach ensured an accurate total count for each of the 14 scenes by tracking individuals (without identifying the actual persons identity) across all frames rather than relying on isolated snapshots. The analysis also accounted for ‘churn’ or the movement of people into and out of the event area. However, we did not have access to accurate data about crowd behaviour, so we assumed an idealised model of churn; specifically, it was assumed that the number of people leaving and entering a scene would be roughly equal, making any impact on the final count negligible.
3.4 Results of aerial photography data methodology
Absolute Error: 948 : The average difference between the actual attendance and the attendance our model predicted, measured in the same units as the data (in this case number of people). It shows, on average, how many people our predictions were off by. A smaller number means the model is better at making accurate predictions. It should be noted the baseline figure of 4000 is itself a rough estimate, so it may be our model was more accurate than the baseline.
Percentage Error: 23.7% : The average difference between the actual and predicted attendance, expressed as a percentage of the actual attendance. This tells us the size of the error in relative terms and is helpful for understanding how big the error is compared to the event’s size. A lower percentage means the model is performing well.
Predicted Attendance: 3052 (Baseline 4000) : The number of people the model thinks attended the event.
These figures assume a rate of crowd churn of 50% between shots. This brings the predicted figure within 25% percentage error of the 4,000 baseline figures from Bradford 2025 team. Note the baseline used here itself is an estimation based on footfall counts, so results presented here may be more or less accurate than they currently appear when compared to this baseline.
4. Mobile App Data Methodology
There are a three key steps (covered in sections 4.1 - 4.3) in the process of measuring site and event visitation using mobile phone data:
-
Defining Boundaries
-
Extracting Mobile Phone Data
-
Scaling and Estimating Visits using Sampling Weighting
4.1 Defining Boundaries
Summary: Establishing the geographic boundaries of the site or event for data collection. Ensuring accurate delineation to minimize data spillover from surrounding areas.
Figure 5: Established geographic boundary for Les Giraffes
Boundary Available: Where the site boundary is available through Open Street Maps, it is obtained by querying the Application Programming Interface.
Boundary Unavailable: Where the site boundary is not available from Open Street Maps, it is drawn manually using Geographic Information Systems software based on the Open Street Maps base layer.
4.2 Extracting Mobile Phone Data
Summary: Extracted geographic boundaries serve as the foundation for collecting geospatial mobile phone data, capturing GPS-recorded device locations within the defined area.
Figure 6: Capturing GPS-recorded device locations within the defined area
Collected Data for site of interest:
-
Unique Mobile Users – Number of distinct devices detected at the site.
-
Number of Mobile Visitor Days – Total device visits over a given period.
-
Spatial Patterns of Use – Sensitivity of GPS data within the site.
-
Visitor Catchment Areas – Geographic origins of visitors to the site.
-
Estimated Geo-Demographics – Socio-economic characteristics inferred from visitors’ home area.
-
National Mobile User Panel – Broader panel dataset used to weight and scale mobile visitation to reflect the total population.
4.3 Scaling and Estimating Visits using Sampling Weighting
Summary: The mobile phone population represents only a subset of total visitors to a site, so it is necessary to weight and scale the data to account for under- or over-representation across areas and produce population-level estimate of visitation.
Figure 7: Graphical depiction of sample weighting visitation estimates
Each mobile user is assigned a home area monthly. To estimate total visitation to the site, visitors are weighted by their home County to correct for over- or under-representation of mobile users across the UK. Visits are then scaled to a population-level estimate using the ratio of mobile users to the adult population in each County. This process is applied daily across all Counties with recorded visitors, and the final estimate is the aggregation across Counties.
4.4 Results of Mobile App Data Methodology
Absolute Error: 3410 : The average difference between the actual attendance and the attendance our model predicted, measured in the same units as the data (in this case number of people). It shows, on average, how many people our predictions were off by. A smaller number means the model is better at making accurate predictions. It should be noted the baseline figure of 4000 is itself a rough estimate, so it may be our model was more accurate than the baseline.
Percentage Error: 85% : The average difference between the actual and predicted attendance, expressed as a percentage of the actual attendance. This tells us the size of the error in relative terms and is helpful for understanding how big the error is compared to the event’s size. A lower percentage means the model is performing well.
Predicted Attendance: 7410 : The number of people the model thinks attended the event.
Estimates of visitation are for the entire day, not the specific event time which was around 1-2 hours, resulting in over-estimation of visitation. The weighting methodology developed is applied on a daily level, meaning caution should be applied when comparing against an event baseline of less than 24 hrs.
4.5 Limitations
There are a few key limitations with this approach of leveraging location data to predict overall attendance at events:
-
Boundary Specification: Given the geospatial nature of the data, the process of defining the boundary of a site directly affects the extracted data and, consequently, the visitation estimates. Inaccurate or inconsistent boundary delineations may lead to underestimation or overestimation of visitor numbers, particularly for sites that lack clearly defined perimeters or those with complex spatial layouts.
-
Accuracy and Impact of Surrounding Areas: Mobile phone data inherently contains positional inaccuracies due to a range of factors. This can result in data from adjacent areas being incorrectly attributed to a site, particularly when the site is surrounded by roads, transit hubs, or densely populated urban infrastructure. Such spillover effects may introduce systematic biases in visitor estimates.
-
Time Period and Data Volume Constraints: The choice of the study period and the availability of data within specific time windows impact the reliability of visitor estimates. Shorter time periods may result in lower data volumes, leading to increased variability and reduced confidence in the estimates. This limitation is particularly pronounced in less frequently visited sites or during off-peak periods, where mobile data penetration may be lower. Additionally, fluctuations in data availability across different seasons, days of the week, or special event periods can introduce inconsistencies in trend analysis.
-
User Demographics and Sample Bias: While some research has shown that mobile phone data provides a good fit to the general population in terms of geographic and socio-demographic coverage, it still represents only a small percentage of the total population. When analysing smaller spaces or shorter time periods, the subset of available data is reduced further, increasing the potential for sample bias in the output.
5. Social Media Methodology
There are 4 steps (covered in Sections 5.1 - 5.4) in the process of estimating attendance from Social Media Methodology:
-
Data Collection
-
Data Extraction
-
Data Processing & Classification
-
Scaling Model
5.1 Data Collection
Many social media companies have strict policies governing the use of their APIs. To ensure compliance with these policies the Pulsar platform was used to access social media posts at scale. Pulsar provides an API and managed wrap-around service that allows users to collect and analyse social media posts across multiple sources in compliance with the terms of service on the relevant platforms. This provided the primary data set for our social media analysis.
To protect user privacy, all data gathered across the platforms explored – Facebook, Instagram, X, Trip Advisor, Reddit - was anonymised by replacing usernames with randomly generated IDs.
To structure the data collection effectively, two main types of queries were employed on the Pulsar platform:
Live Queries: Designed for ongoing events, this was the type of query used for the British Museum, these queries collect data over a defined period. The primary data sources for these queries were Instagram (limited to live data only) and Facebook (limited data to the past 30 days only - set by Meta’s Terms and Conditions).
Historic Queries: Used for past, one-off events, such as the London Marathon, capturing data from one week before and after the event. The key data sources include Twitter, TripAdvisor, and Reddit. Having access to both live and historic data means modelling approaches using social media data are more flexible and can be used for a wider range of events.
5.2 Data Extraction
Searches for social media posts that can be used to predict attendance are constructed on the Pulsar platform using a Boolean search query defined by the user and then refined for each platform’s specific requirements.
Our methodology used large language models (LLMs) to generate queries, albeit with manual oversight. Additional adjustments were also made using event-specific details to improve accuracy and relevance. The Pulsar platform offers a Boolean generator support tool to support users with constructing the validated syntax.
Once a search is running on the Pulsar platform, whether live or historic, it is assigned a unique search ID. This ID, along with specified start and end dates, is then used within code notebooks to download the social media data that has been pulled into the platform.
5.3 Data Processing & Classification
-
Classification of Posts for Event Attendance: After downloading the data, each social media post must be classified to determine whether the author attended, plans to attend, or is likely to attend the event. This classification is performed using a Large Language Model, which analyses the content and context of each post. This project used a Llama model, which can be run locally on an inexpensive GPU.
-
LLM Query Construction: To accurately classify posts, a structured query is created for the LLM. This query includes:
-
General Instruction: The LLM is prompted with a directive such as; “You are a helpful assistant for classifying posts about event attendance.”
-
Event Details: The event name and description are provided. If necessary, an external source can be used to obtain a more detailed event description (e.g from the event website).
-
Example Posts: To improve accuracy, the LLM is supplied with sample posts that illustrate different classifications. These examples help it distinguish between; users who attended the event, users who intend to attend the event, users who only engaged with the event remotely (e.g., watching on TV).
-
-
LLM Output and Post-Classification: The LLM analyses each post based on the provided query and assigns a classification. It outputs; “1” if the post indicates that the user attended the event, “0” if the post suggests the user did not attend. This classification allows for structured data analysis, providing insights into event participation trends based on social media activity.
5.4 Scaling Model
Purpose The primary objective of the scaling model is to estimate overall event attendance from the number of people posting on social media about being at the event. Since social media posts only represent a subset of actual attendees, once the model has classified social media posts correctly into attendees and non-attendees, it needs to extrapolate this figure to make an estimate for all attendees at the event.
Figure 8: Flowchart depicting how visitation predictions are created
Process
-
Input: The model takes as input the number of “positive posts” (e.g. attending the event) classified by the LLM. “Positive posts” refer to social media posts made by individuals who are likely to have attended the event.
-
Extropolation: The scaling model extrapolates from the number of positive posts to estimate the total attendance.
-
Trained Model and Preprocessing: Firstly, the model will standardise the data collected from different social media platforms so they can be processed within the same model and remove outliers like posts with limited content (e.g. posts solely with emojis). Based on a series of features (a sample of the most important listed below) the model then predicts attendance values based on these structured inputs; Event type (Categorical feature indicating event type), Log_Pulsar_attendance (Logarithmically transformed attendance from the social media posts), Engagement rate (Prevalence of likes + shares + comments), Sentiment norm (Sentiment calculated with Pulsar sentiment analysis tool).
-
Ouput: The scaling model produces an estimate of event attendance; For historical events (e.g., concerts, rallies), the model provides a total attendance estimate based on data collected one week before to one week after the event; For ongoing events (e.g., British Museum), the model predicts weekly attendance for a two-week period.
Figure 9: Graph shows model prediction of attendance at a sample of test events used to train the model (y-axis) against the baseline attendance for events (x-axis). The closer to blue dots are to the dashed red-line, the more accurate the prediction.
5.5 Results of Social Media Data Methodology
-
Absolute Error: 1.6M : The average difference between the actual attendance and the attendance our model predicted, measured in the same units as the data (in this case number of people). It shows, on average, how many people our predictions were off by. A smaller number means the model is better at making accurate predictions.
-
Percentage Error: 235% : The average difference between the actual and predicted attendance, expressed as a percentage of the actual attendance. This tells us the size of the error in relative terms and is helpful for understanding how big the error is compared to the event’s size. A lower percentage means the model is performing well.
-
Predicted Attendance: 2.2M (Actual 660K) : The number of people the model thinks attended the event.
5.6 Limitations
There are a few key limitations with this approach of leveraging social media data to predict overall attendance at events:
-
Query Design: People posting about events on social media will use different language or hashtags to describe the same event, meaning that queries may not collect all relevant social media posts. And additional complication is that queries which are too broad will collect too much irrelevant data (e.g. a query just with the word football will collect billions of posts), which a) can’t be processed and b) lowers performance of the modelling. The main mitigation is using specific, standardised and logical query terms based on a search through social media for how most people are referring to the event. More detail on how to construct these are in the toolkit accompanying this report.
-
Platform-Specific Search Methods: Different social media sources require different handling approaches.
- Instagram: Searches are limited to hashtags, restricting the possible breadth of data collection.
- Facebook: Keyword-based searches can be more restrictive than fully Boolean queries (e.g., lacking support for nested “AND’s” or “OR’s”).
-
LLM Prompting: The classification model used for this research project relies on a relatively small LLM hosted locally to ensure compliance with data governance regulations – in this case not sharing personal data with a third-party (i.e. the model provider). As a result, the locally hosted models used for this research are likely to be poorer performing than larger cloud-native LLMs, resulting in lower accuracy and less reliable estimates for attendance.
-
User Demographics and Social Media Habits: The likelihood of event attendees posting varies by demographic and social media platforms, meaning specific demographics may be over or under represented into the model estimates, resulting in both biased results and poorer model performance. Given access to all the demographic data on social media users and the demographics of people who attended this event are not available, there’s no way to correct for this bias within this approach.
-
Model drift: As social media habits change over time, the model will be prone to ‘drift’ where the performance degrades, and re-training the model is necessary to ensure continued good model performance.
-
Scaling Model and Small Development Dataset: Due to the comparatively small dataset used in this research (e.g. only tens of thousands of social media posts, compared to large datasets of millions of posts), the training of the scaling model was sensitive to outliers and risks overfitting (where the model learns the relationships within the training data but then can’t generalise these learnings to other events). This affected the model’s accuracy and generalisability to some ‘outlying’ events, for example…
-
Events where ‘Virtual’ Attendance is Possible: If it is available to view on TV or online, there will be more posts with a significantly smaller proportion of people who attended.
-
Events with Unusual Post Sentiment and Engagement: For example, the Women’s Euro 2022 Final Screenings will have had significantly more posts, with more positive sentiment since England won!
-
Estimates of Ground Truth: The baseline attendance of events we used for training and evaluating our models were frequently estimates themselves, meaning the final assessment of our results may not be completely fair or accurate.
-
-
Data Source considerations and restrictions: Due to the platforms T&C’s and behaviour of users on social media platforms):
-
Instagram: Only live data can be collected, restricting access to historical content and is confined to public content.
-
Facebook: Data is limited to the past 30 days and is confined to public content, meaning no private groups can be captured.
-
X: Changes in user behavior have shifted the platform’s use in recent years. Posts indicated a focus on news and commentary, rather than event-related posts, based on a review of collected posts.
-
Reddit: Reddit was found to be a less reliable source for event-related posts, as users are were consistently less likely to share event details on the platform based on our assessment of posts used in the model training.
-
6. Comparison of Methods
Below is an assessment of the performance of each methodology in its specific application to the Bradford City of Culture
6.1 Aerial photography model (Recommended): 24% error compared to baseline
Explanation
-
Both aerial models produced the best results for counting attendance at the event. The 24% error rate is likely exaggerated, as the baseline count from the event team (4,000) is an estimate itself, meaning our model could be more accurate than indicated.
-
Overestimation is also due to churn rate assumptions. The model assumed 50% churn, but attendees may have stayed for the whole event. Applying a lower churn-rate would potentially improve accuracy, but this project lacked a corroborating input on crowd behaviour to justify this assumption.
Precaution for use
-
Object detection is well-suited to large, static crowds, such as those present at the Bradford procession where a dense, slow-moving/stationary crowd was in attendance.
-
While these crowds lend themselves to aerial image-based models, many events with faster moving crowds would be more difficult to measure.
-
This methodology depends on high-quality aerial footage that covers and entire event for its entire duration. In our experience, it is rare that such data is collected and is available for analysis.
Stronger for:
-
Accurate results where quality footage is available and approximate figures for churn between frames can be established to reduce double-counting.
-
Approach is highly replicable and low cost.
-
Innate risk of personal identification in images is unlikely if no facial recognition software is used, as in this project.
Weaker for:
-
Access to high-resolution, high-vantage aerial photography data is hard to access, normally requiring bespoke deployment of drones or cameras to events or close collaboration with event organisers.
-
No additional information is provided about the demographic make-up of crowds.
-
Model struggles to detect people against complex, shadowy or unusual backgrounds.
6.2 Mobile App Model (Not Recommended): 85% error compared to baseline
Explanation
This approach performed poorly for this target event for two reasons:
-
The minimum data collection period for location data required by this methodology is 24 hours, but the procession only lasted a few hours. As a result, the model likely counted people who were in the area before and after the event but were not actually attending the procession;
-
Given the event was at the weekend in a busy city centre, it is likely that some people passing through the location during the procession were not participating in the event itself, leading to over-counting.
Precaution for use
-
Location data is best suited for events spanning longer time periods, such as counting attendance at a location over a year. For short events, alternative data should be explored as mobile app location data can lack granularity in this regard.
-
Separately to the model’s performance, there are some ethical concerns associated with the fact that users of these mobile apps do not actively opt in to their data being used to track their attendance.
Stronger for:
-
Accurate for long-running recurring events that repeat over the course of weeks, months or years e.g. exhibits or parks etc.
-
Method is low-cost and straightforward to deliver and can be compared to known population distributions for specific sites or events, helping to uncover demographic info and reduce bias.
Weaker for:
-
Huq data is generally less suitable for short lived events (e.g. A few hours in duration). It is better suited for recurring events and locations with fixed boundaries.
-
Mobile phone location data is subject to some ethical concerns around use. While this data is legal to collect and for providers like Huq to license, there are some concerns about whether individuals can be said have given ‘informed consent’ for their data to be collected.
6.3 Social Media Data (Not Recommended): 56% error compared to baseline
Explanation
This approach performed poorly for the Bradford City of Culture.
-
The most likely explanation for this is that the procession simply did not meet the minimum threshold of social media footprint for the model to make an accurate prediction on the basis of social media posts.
-
There were far fewer online posts about the Les Giraffes procession compared to other events in this research, and the model was insufficiently sensitive make a prediction using this data.
Precaution for use
-
Likely to need a significantly larger training data set to accurately label attendance at events such as the Bradford City of Culture Les Giraffes procession, which has unusual characteristics given its unorthodox giraffe theming.
-
Further, in cases like a small procession through a city centre over just 90 minutes, there may be some hard limits to this approach, where there are simply too few social media posts to provide accurate counts of in-person attendance.
Stronger for:
-
Provides a low-cost and straightforward way of leveraging social media data to predict attendance. Pulsar’s managed-API service is intuitive and reduces friction of managing platform-level terms of service obligations.
-
Most accurate for events where events have an in-person attendance with significant numbers of posts confirming in person attendance.
Weaker for:
-
Accuracy reduced where events have few online posts overall, as with this procession.
-
Incorporation of traditional survey methods into the modelling approach would improve results.
-
While compliant with platform-level terms of service, there is not clearly ‘informed consent’ for this use case from an ethical point of view.
-
Risks with increased privacy walls means the approach may not be extendable in the future.
-
Researchers at the University of Bradford, from the School of Archaeological and Forensic Sciences and Department of Computer Science, have conducted similar research. They show how combining different perspectives, whether from the same type of sensor or different ones, can improve accuracy. Their team has used tools like video, timelapse photography, 360° cameras, and fixed sensors to track movement, count footfall, and record sound. This approach also helps to understand the emotional engagement with, over and above attendance at, the event or location. ↩