Capturing engagement numbers - strand 2 - Women's Euros final case study

Q: 1. Event Overview

Location: Women’s Euros Screening, Piccadilly Gardens, Manchester, UK Date: 31st July 2022 Summary: The final of the Women’s Euros was broadcast simultaneously at nine UEFA-organized sites across the UK. The Manchester screening was selected as a research candidate for the counting engagement project because it exhibited the following characteristics: As one of nine simultaneous events, predicting attendance at the Manchester site with social media data will be an interesting challenge. This is exaggerated by the fact that the event had significant online viewership (50m in the UK alone), compared to the Manchester-specific screening, which was attended by 2,000 people. As a one-off and time-bound event in a busy city centre, attributing presence in the location to the event will be difficult to establish.

Question 1

Executive Summary

Accepted Answer

While ticketing data gives a good understanding of engagement with ticketed events, measuring engagement at un-ticketed events is difficult and often relies on surveys. Although the data provided by these surveys is of good quality, replicable and can provide demographic insight they come with their own challenges and limitations. They are limited, for example, in their ability to comprehensively measure local participation at specific events or spaces. Given even ticket sales or traditional crowd counting methods may not be an accurate reflection of attendance numbers, DCMS want to explore new data-driven methods using novel techniques.

This case study explains and compares 2 such methods, each based around a specific data source, for predicting attendance at the Women’s Euros: Mobile app data (Huq) and Social media data (Pulsar)

Each methodology is evaluated and compared against 8 categories: Accuracy, Bias, Ethics, Deliverability, Cost, Demographics, Generalisability and Accessibility.

Case studies have been developed for each event in scope of the research. The Women’s Euros Final Screening was one of 5 events selected.

Question 2

1. Event Overview

Accepted Answer

Location: Women’s Euros Screening, Piccadilly Gardens, Manchester, UK
Date: 31st July 2022
Summary: The final of the Women’s Euros was broadcast simultaneously at nine UEFA-organized sites across the UK.

The Manchester screening was selected as a research candidate for the counting engagement project because it exhibited the following characteristics:

As one of nine simultaneous events, predicting attendance at the Manchester site with social media data will be an interesting challenge. This is exaggerated by the fact that the event had significant online viewership (50m in the UK alone), compared to the Manchester-specific screening, which was attended by 2,000 people. As a one-off and time-bound event in a busy city centre, attributing presence in the location to the event will be difficult to establish.

Question 3

2. Methods

Accepted Answer

Two methodologies were developed to predict attendance at the Women’s Euros:

Mobile App Model
Many applications collect real-time location data on mobile phones or GPS-enabled devices. These data points are routinely anonymised and sold to third parties(detailed in the T&C’s when you sign up to the app) which collate and aggregate this information to provide population level estimates of people’s locations.

Data providers (such as Huq) create dashboards, based on a raw feed of real-time data. These data dashboards allow users to visualise data at different points of interest, defined by user-generated boundaries.

Aggregated underlying data from Huq was accessed to analyse the outputs for specific boundaries of the Women’s Euros Screening event and predict event attendance.

Social Media Data
User-generated content posted on social media platforms can contain information about people’s locations. These digital traces – such as status posts and photos and any public information in their profile about location - can be used as a proxy for physical attendance.

Social media platforms have commercial agreements with social media monitoring companies, such as Pulsar (who we used for this work), who provide a service allowing users to search for and collect posts fully in compliance with the platform’s terms and conditions.

Anonymised social data from Pulsar was analysed and used in a classifier model to predict overall attendance using posts that can be determined as from true attendees of the Women’s Euros Screening event.

Question 4

3. Mobile App Data Methodology

Accepted Answer

There are a three key steps (Covered in Sections 3.1 - 3.3) in the process of measuring site and event visitation using mobile phone data:

Defining Boundaries
Extracting Mobile Phone Data
Scaling and Estimating Visits using Sampling Weighting

3.1 Defining Boundaries

Summary: Establishing the geographic boundaries of the site or event for data collection. Ensuring accurate delineation to minimize data spillover from surrounding areas.

Figure 1: Established geographic boundary for Piccadilly Gardens Screening

Boundary Available: Where the site boundary is available through Open Street Maps, it is obtained by querying the Application Programming Interface.

Boundary Unavailable: Where the site boundary is not available from Open Street Maps, it is drawn manually using Geographic Information Systems software based on the Open Street Maps base layer.

3.2 Extracting Mobile Phone Data

Summary: Extracted geographic boundaries serve as the foundation for collecting geospatial mobile phone data, capturing GPS-recorded device locations within the defined area.

Figure 2:Capturing GPS-recorded device locations within the defined area

Collected Data for site of interest:

Unique Mobile Users – Number of distinct devices detected at the site.
Number of Mobile Visitor Days – Total device visits over a given period.
Spatial Patterns of Use – Sensitivity of GPS data within the site.
Visitor Catchment Areas – Geographic origins of visitors to the site.
Estimated Geo-Demographics – Socio-economic characteristics inferred from visitors’ home area.
National Mobile User Panel – Broader panel dataset used to weight and scale mobile visitation to reflect the total population.

3.3 Scaling and Estimating Visits using Sampling Weighting

Summary: The mobile phone population represents only a subset of total visitors to a site, so it is necessary to weight and scale the data to account for under- or over-representation across areas and produce population-level estimate of visitation.

Figure 3: Graphical depiction of sample weighting visitation estimates

Each mobile user is assigned a home area monthly. To estimate total visitation to the site, visitors are weighted by their home County to correct for over- or under-representation of mobile users across the UK. Visits are then scaled to a population-level estimate using the ratio of mobile users to the adult population in each County. This process is applied daily across all Counties with recorded visitors, and the final estimate is the aggregation across Counties.

3.4 Results of Mobile App Data Methodology

Absolute Error: 33,352 : The average difference between the actual attendance and the attendance our model predicted, measured in the same units as the data (in this case number of people). It shows, on average, how many people our predictions were off by. A smaller number means the model is better at making accurate predictions.

Percentage Error: 168% : The average difference between the actual and predicted attendance, expressed as a percentage of the actual attendance. This tells us the size of the error in relative terms and is helpful for understanding how big the error is compared to the event’s size. A lower percentage means the model is performing well.

Predicted Attendance: 5,352 (Actual 2,000) : The number of people the model thinks attended the event.

Estimates of visitation are for the entire day, not the specific event time which was approximately 3-4 hours, resulting in over-estimation of visitation. The weighting methodology developed is applied on a daily level.

3.5 Limitations

There are a few key limitations with this approach of leveraging location data to predict overall attendance at events:

Boundary Specification: Given the geospatial nature of the data, the process of defining the boundary of a site directly affects the extracted data and, consequently, the visitation estimates. Inaccurate or inconsistent boundary delineations may lead to underestimation or overestimation of visitor numbers, particularly for sites that lack clearly defined perimeters or those with complex spatial layouts.
Accuracy and Impact of Surrounding Areas: Mobile phone data inherently contains positional inaccuracies due to a range of factors. This can result in data from adjacent areas being incorrectly attributed to a site, particularly when the site is surrounded by roads, transit hubs, or densely populated urban infrastructure. Such spillover effects may introduce systematic biases in visitor estimates.
Time Period and Data Volume Constraints: The choice of the study period and the availability of data within specific time windows impact the reliability of visitor estimates. Shorter time periods may result in lower data volumes, leading to increased variability and reduced confidence in the estimates. This limitation is particularly pronounced in less frequently visited sites or during off-peak periods, where mobile data penetration may be lower. Additionally, fluctuations in data availability across different seasons, days of the week, or special event periods can introduce inconsistencies in trend analysis.
User Demographics and Sample Bias: While some research has shown that mobile phone data provides a good fit to the general population in terms of geographic and socio-demographic coverage, it still represents only a small percentage of the total population. When analysing smaller spaces or shorter time periods, the subset of available data is reduced further, increasing the potential for sample bias in the output.

Question 5

4. Social Media Methodology

Accepted Answer

There are 4 steps (covered in Sections 4.1 - 4.4) in the process of estimating attendance from Social Media Methodology:

Data Collection
Data Extraction
Data Processing & Classification
Scaling Model

4.1 Data Collection

Many social media companies have strict policies governing the use of their APIs. To ensure compliance with these policies the Pulsar platform was used to access social media posts at scale. Pulsar provides an API and managed wrap-around service that allows users to collect and analyse social media posts across multiple sources in compliance with the terms of service on the relevant platforms. This provided the primary data set for our social media analysis.

To protect user privacy, all data gathered across the platforms explored – Facebook, Instagram, X, Trip Advisor, Reddit - was anonymised by replacing usernames with randomly generated IDs.

To structure the data collection effectively, two main types of queries were employed on the Pulsar platform:

Live Queries: Designed for ongoing events, this was the type of query used for the British Museum, these queries collect data over a defined period. The primary data sources for these queries were Instagram (limited to live data only) and Facebook (limited data to the past 30 days only - set by Meta’s Terms and Conditions).

Historic Queries: Used for past, one-off events, such as the London Marathon, capturing data from one week before and after the event. The key data sources include Twitter, TripAdvisor, and Reddit. Having access to both live and historic data means modelling approaches using social media data are more flexible and can be used for a wider range of events.

4.2 Data Extraction

Searches for social media posts that can be used to predict attendance are constructed on the Pulsar platform using a Boolean search query defined by the user and then refined for each platform’s specific requirements.

Our methodology used large language models (LLMs) to generate queries, albeit with manual oversight. Additional adjustments were also made using event-specific details to improve accuracy and relevance. The Pulsar platform offers a Boolean generator support tool to support users with constructing the validated syntax.

Once a search is running on the Pulsar platform, whether live or historic, it is assigned a unique search ID. This ID, along with specified start and end dates, is then used within code notebooks to download the social media data that has been pulled into the platform.

4.3 Data Processing & Classification

Classification of Posts for Event Attendance: After downloading the data, each social media post must be classified to determine whether the author attended, plans to attend, or is likely to attend the event. This classification is performed using a Large Language Model, which analyses the content and context of each post. This project used a Llama model, which can be run locally on an inexpensive GPU.
LLM Query Construction: To accurately classify posts, a structured query is created for the LLM. This query includes:
- General Instruction: The LLM is prompted with a directive such as; “You are a helpful assistant for classifying posts about event attendance.”
- Event Details: The event name and description are provided. If necessary, an external source can be used to obtain a more detailed event description (e.g from the event website).
- Example Posts: To improve accuracy, the LLM is supplied with sample posts that illustrate different classifications. These examples help it distinguish between; users who attended the event, users who intend to attend the event, users who only engaged with the event remotely (e.g., watching on TV).
LLM Output and Post-Classification: The LLM analyses each post based on the provided query and assigns a classification. It outputs; “1” if the post indicates that the user attended the event, “0” if the post suggests the user did not attend. This classification allows for structured data analysis, providing insights into event participation trends based on social media activity.

4.4 Scaling Model

Purpose The primary objective of the scaling model is to estimate overall event attendance from the number of people posting on social media about being at the event. Since social media posts only represent a subset of actual attendees, once the model has classified social media posts correctly into attendees and non-attendees, it needs to extrapolate this figure to make an estimate for all attendees at the event.

Figure 4: Flowchart depicting how visitation predictions are created

Process

Input: The model takes as input the number of “positive posts” (e.g. attending the event) classified by the LLM. “Positive posts” refer to social media posts made by individuals who are likely to have attended the event.
Extropolation: The scaling model extrapolates from the number of positive posts to estimate the total attendance.
Trained Model and Preprocessing: Firstly, the model will standardise the data collected from different social media platforms so they can be processed within the same model and remove outliers like posts with limited content (e.g. posts solely with emojis). Based on a series of features (a sample of the most important listed below) the model then predicts attendance values based on these structured inputs; Event type (Categorical feature indicating event type), Log_Pulsar_attendance (Logarithmically transformed attendance from the social media posts), Engagement rate (Prevalence of likes + shares + comments), Sentiment norm (Sentiment calculated with Pulsar sentiment analysis tool).
Ouput: The scaling model produces an estimate of event attendance; For historical events (e.g., concerts, rallies), the model provides a total attendance estimate based on data collected one week before to one week after the event; For ongoing events (e.g., British Museum), the model predicts weekly attendance for a two-week period.

Figure 5: Graph shows model prediction of attendance at a sample of test events used to train the model (y-axis) against the baseline attendance for events (x-axis). The closer to blue dots are to the dashed red-line, the more accurate the prediction.

Absolute Error: 15,986 : The average difference between the actual attendance and the attendance our model predicted, measured in the same units as the data (in this case number of people). It shows, on average, how many people our predictions were off by. A smaller number means the model is better at making accurate predictions.
Percentage Error: 799% : The average difference between the actual and predicted attendance, expressed as a percentage of the actual attendance. This tells us the size of the error in relative terms and is helpful for understanding how big the error is compared to the event’s size. A lower percentage means the model is performing well.
Predicted Attendance: 17,986 (Actual 2,000) : The number of people the model thinks attended the event.

Note there were 10 screenings of the final across England. This prediction may reflect the classification model’s difficulty in distinguishing them – a back-of-the-postage stamp calculation of 10 x 2,000 gives a national attendance of 20,000, which the model prediction is a good estimate of.

4.6 Limitations

There are a few key limitations with this approach of leveraging social media data to predict overall attendance at events:

Query Design: People posting about events on social media will use different language or hashtags to describe the same event, meaning that queries may not collect all relevant social media posts. And additional complication is that queries which are too broad will collect too much irrelevant data (e.g. a query just with the word football will collect billions of posts), which a) can’t be processed and b) lowers performance of the modelling. The main mitigation is using specific, standardised and logical query terms based on a search through social media for how most people are referring to the event. More detail on how to construct these are in the toolkit accompanying this report.
Platform-Specific Search Methods: Different social media sources require different handling approaches.
- Instagram: Searches are limited to hashtags, restricting the possible breadth of data collection.
- Facebook: Keyword-based searches can be more restrictive than fully Boolean queries (e.g., lacking support for nested “AND’s” or “OR’s”).
LLM Prompting: The classification model used for this research project relies on a relatively small LLM hosted locally to ensure compliance with data governance regulations – in this case not sharing personal data with a third-party (i.e. the model provider). As a result, the locally hosted models used for this research are likely to be poorer performing than larger cloud-native LLMs, resulting in lower accuracy and less reliable estimates for attendance.
User Demographics and Social Media Habits: The likelihood of event attendees posting varies by demographic and social media platforms, meaning specific demographics may be over or under represented into the model estimates, resulting in both biased results and poorer model performance. Given access to all the demographic data on social media users and the demographics of people who attended this event are not available, there’s no way to correct for this bias within this approach.
Model drift: As social media habits change over time, the model will be prone to ‘drift’ where the performance degrades, and re-training the model is necessary to ensure continued good model performance.
Scaling Model and Small Development Dataset: Due to the comparatively small dataset used in this research (e.g. only tens of thousands of social media posts, compared to large datasets of millions of posts), the training of the scaling model was sensitive to outliers and risks overfitting (where the model learns the relationships within the training data but then can’t generalise these learnings to other events). This affected the model’s accuracy and generalisability to some ‘outlying’ events, for example…
- Events where ‘Virtual’ Attendance is Possible: If it is available to view on TV or online, there will be more posts with a significantly smaller proportion of people who attended.
- Events with Unusual Post Sentiment and Engagement: For example, the Women’s Euro 2022 Final Screenings will have had significantly more posts, with more positive sentiment since England won!
- Estimates of Ground Truth: The baseline attendance of events we used for training and evaluating our models were frequently estimates themselves, meaning the final assessment of our results may not be completely fair or accurate.
Data Source considerations and restrictions: Due to the platforms T&C’s and behaviour of users on social media platforms):
- Instagram: Only live data can be collected, restricting access to historical content and is confined to public content.
- Facebook: Data is limited to the past 30 days and is confined to public content, meaning no private groups can be captured.
- X: Changes in user behavior have shifted the platform’s use in recent years. Posts indicated a focus on news and commentary, rather than event-related posts, based on a review of collected posts.
- Reddit: Reddit was found to be a less reliable source for event-related posts, as users are were consistently less likely to share event details on the platform based on our assessment of posts used in the model training.

Question 6

5. Comparison of Methods

Accepted Answer

Below is an assessment of the performance of each methodology in its specific application to the Women’s Euros Final Screening

5.1 Mobile App Model (Not Recommended): 168% error compared to baseline

Explanation

This approach performed poorly for this event because:

The minimum data collection period required for this methodology is 24 hours, but the football match lasted only a few hours. As a result, the model likely counted people who were in the area before and after the event but were not actually attending the screening;
The event took place at the weekend in a busy city centre, meaning many people passing through the location during the match were not participating in the event, leading to over-counting.

Precaution for use

Location data is best suited for events spanning longer time periods, such as counting attendance at a location over a year. For short events, alternative data should be explored as mobile app location data can lack granularity in this regard.
Separately to the model’s performance, there are some ethical concerns associated with the fact that users of these mobile apps do not actively opt in to their data being used to track their attendance at events, raising privacy concerns.

Stronger for:

Accurate for long-running recurring events that repeat over the course of weeks, months or years e.g. exhibits or parks etc.
Method is low-cost and straightforward to deliver with the right expertise and can be compared to known population distributions for specific sites or events, helping to uncover demographic information and reduce bias.

Weaker for:

Huq data is generally less suitable for short lived events (e.g. A few hours in duration). It is better suited for recurring events and locations with fixed boundaries.
Mobile phone location data is subject to some ethical concerns around use. While this data is legal to collect and for providers like Huq to license, there are some concerns about whether individuals can be said have given ‘informed consent’ for their data to be collected.

Explanation

This approach performed very poorly for the Manchester Piccadilly Screening. This is primarily because the match was a major global event with high online engagement.
Many people posted about the match, but few were physically present at the Manchester screening. The model overestimated attendance by counting people engaging with the event online or attending other screenings.
If the goal was to predict attendance at all 10 official screenings combined, the estimate is relatively accurate, but it lacked the granularity to isolate the Manchester Piccadilly event.

Precaution for use

This approach has proven to be highly sensitive to large-scale online engagement. Events with widespread online discussion beyond physical attendees will likely lead to overestimated attendance figures.
The model is better suited to estimating total engagement across multiple event locations, especially when no geotagging is available. It is highly likely the model’s ability to correctly label posts will improve with a larger training dataset. However, in cases like a Manchester-specific screening of a global sporting event, there may be some hard limits to this approach, where social media posts lack sufficient information to provide accurate counts of in-person attendance.

Stronger for:

Provides a low-cost and straightforward way of leveraging social media data to predict attendance. Pulsar’s managed-API service is intuitive and reduces friction of managing platform-level terms of service obligations.
Most accurate for medium-large events where events have an in person attendance such that a significant number of posts confirm in person attendance.

Weaker for:

Accuracy is reduced where events have a large television or online audience who post online about the event, as was the case the screening of the Women’s Euros final.
Incorporation of traditional survey methods into the modelling approach would likely improve results here.
There are some ethical concerns about using social media data in this way. While compliant with platform-level terms of service, there is not clearly ‘informed consent’ for this use case.
Risks with increased privacy walls means the approach may not be extendable in the future.

Capturing engagement numbers - strand 2 - Women's Euros final case study

Executive Summary

1. Event Overview

2. Methods

3. Mobile App Data Methodology

3.1 Defining Boundaries

3.2 Extracting Mobile Phone Data

3.3 Scaling and Estimating Visits using Sampling Weighting

3.4 Results of Mobile App Data Methodology

3.5 Limitations

4.1 Data Collection

4.2 Data Extraction

4.3 Data Processing & Classification

4.4 Scaling Model

Process

4.6 Limitations

5. Comparison of Methods

5.1 Mobile App Model (Not Recommended): 168% error compared to baseline

Explanation

Precaution for use

Stronger for:

Weaker for:

Explanation

Precaution for use

Stronger for:

Weaker for:

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

Executive Summary

1. Event Overview

2. Methods

3. Mobile App Data Methodology

3.1 Defining Boundaries

3.2 Extracting Mobile Phone Data

3.3 Scaling and Estimating Visits using Sampling Weighting

3.4 Results of Mobile App Data Methodology

3.5 Limitations

4. Social Media Methodology

4.1 Data Collection

4.2 Data Extraction

4.3 Data Processing & Classification

4.4 Scaling Model

Process

4.5 Results of Social Media Data Methodology

4.6 Limitations

5. Comparison of Methods

5.1 Mobile App Model (Not Recommended): 168% error compared to baseline

Explanation

Precaution for use

Stronger for:

Weaker for:

5.2 Social Media Data (Not Recommended): 799% error compared to baseline

Explanation

Precaution for use

Stronger for:

Weaker for:

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK