Capturing engagement numbers - strand 2 - The Great North Run case study

Q: 1. Event Overview

Location: Great North Run, Newcastle, UK Date: September 8th, 2024 Time: 8AM to 4PM on race day, covering official race duration. Summary: Estimating visits over the course of one calendar year. The 2024 Great North Run was selected as a research candidate for the counting engagement project because it exhibited the following characteristics: One of the premier outdoor sporting events in the UK Constantly moving targets makes counting attendance more challenging Complex course and presence of many overlapping routes introduces risk double-counting, requiring careful mitigation Scaling problem: Requirement to estimate overall attendance on the basis of a smaller figure of Strava users One-off and time-bound event The focus is on the 60,000 runners only, as this provides known baseline attendance for results validation from official race attendance figures.

Question 1

Executive Summary

Accepted Answer

While ticketing data gives a good understanding of engagement with ticketed events, measuring engagement at un-ticketed events is difficult and often relies on surveys. Although the data provided by these surveys is of good quality, replicable and can provide demographic insight they come with their own challenges and limitations. They are limited, for example, in their ability to comprehensively measure local participation at specific events or spaces. Given even ticket sales or traditional crowd counting methods may not be an accurate reflection of attendance numbers, DCMS want to explore new data-driven methods using novel techniques.

This case study explains and compares 3 such methods, each based around a specific data source, for predicting attendance at the Great North Run: Activity data (Strava), Mobile app data (Huq) and Social media data (Pulsar)

Each methodology is evaluated and compared against 8 categories: Accuracy, Bias, Ethics, Deliverability, Cost, Demographics, Generalisability and Accessibility.

Case studies have been developed for each event in scope of the research. The Great North Run was one of 5 events selected.

Question 2

1. Event Overview

Accepted Answer

Location: Great North Run, Newcastle, UK
Date: September 8th, 2024
Time: 8AM to 4PM on race day, covering official race duration.
Summary: Estimating visits over the course of one calendar year.

The 2024 Great North Run was selected as a research candidate for the counting engagement project because it exhibited the following characteristics:

One of the premier outdoor sporting events in the UK
Constantly moving targets makes counting attendance more challenging
Complex course and presence of many overlapping routes introduces risk double-counting, requiring careful mitigation
Scaling problem: Requirement to estimate overall attendance on the basis of a smaller figure of Strava users
One-off and time-bound event
The focus is on the 60,000 runners only, as this provides known baseline attendance for results validation from official race attendance figures.

Question 3

2. Methods

Accepted Answer

Three methodologies were developed to predict attendance at the Great North Run:

Activity Data Model
Popular activity and mobility tracking apps like Strava can be used to gauge attendance at social events, especially with a sporting focus.

Strava collects data through GPS tracking, recorded via users’ smartphones or wearable devices while they participate in activities like running, cycling, or walking. This GPS data includes information like location, speed, distance, and elevation and is aggregated anonymously and processed into broader trends, presented on a dashboard service called Strava Metro.

This approach used data from Strava Metro, as an input to a scaling model that can predict how many people will attend the Great North Run.

Mobile App Model
Many applications collect real-time location data on mobile phones or GPS-enabled devices. These data points are routinely anonymised and sold to third parties(detailed in the T&C’s when you sign up to the app) which collate and aggregate this information to provide population level estimates of people’s locations.

Data providers (such as Huq) create dashboards, based on a raw feed of real-time data. These data dashboards allow users to visualise data at different points of interest, defined by user-generated boundaries.

Aggregated underlying data from Huq was used to analyse the outputs for specific boundaries of the Great North Run and predict event attendance.

Social Media Data
User-generated content posted on social media platforms can contain information about people’s locations. These digital traces – such as status posts and photos and any public information in their profile about location - can be used as a proxy for physical attendance.

Social media platforms have commercial agreements with social media monitoring companies, such as Pulsar (who we used for this work), who provide a service allowing users to search for and collect posts fully in compliance with the platform’s terms and conditions.

Anonymised and analysed social data from Pulsar was used in a classifier model to predict overall attendance using posts that can be determined as from true attendees of the Great North Run.

Question 4

3. Activity Data Methodology

Accepted Answer

There are 3 steps (covered in sections 3.1 - 3.3) in the process of using Activity Data to model event attendance:

Count Strava Attendance
Train model to predict overall attendance
Predict overall attendance

3.1 Count Strava Attendance

The race boundary of the Great North Run was defined using official event data and downloaded as a GPX file – a file format that contains a list of points (that is, coordinates) that make up the length of the route.

To this area an additional added a ’buffer zone’ was to capture the Strava routes of all participating runners (the focus was on runners only due to baseline availability), across the width of the entire course.

Figure 1: Position of measuring ‘gates’ to count attendance

Three routes were then sampled from the total of pool Strava data, each representing a ‘measuring gate’. These gates were deliberately spaced such that no runner would pass through multiple gates in over of an hour period. This was calculated from the average speed of the fastest 10% of runners.

The total number of runners passing through all gates over an hour period in the race was then taken as a count of overall Strava participation.

3.2 Train model to predict overall attendance

Having established the total number of Strava users participating in the Great North Run (the focus was on the runners only, due to baseline availability), the next step is to train a ‘scaling model’ that can predict the overall race attendance based on this smaller figure of Strava users. This requires a training data set comprising of races with known attendance: (i) The number of Strava users; (ii) The true total race participation.

This training data-set was comprised of a diverse set of races, to create a model that will generalise successfully and make accurate predictions across other races, starting with the Great North Run. The training dataset for this model involved 10 races from around the UK, which were selected as they represented a diversity of geographical locations, a variety of race lengths and attracted different demographics in terms of participants.

Figure 2: Composition of training dataset

3.3 Predict overall attendance

Figure 3: Approach to predicting attendance at the Great North Run

Having trained the scaling model, the final step is to use the count of Strava users participating in the 2024 Great North Run (runners only) – established in step 1 – as input data to the scaling model. By training the model on a 10-race data set (described previously at step 2) as well as other features including weather and demographic data (from the census), the scaling model can seek to predict overall attendance at the run. This approach leveraged several different scaling models, developed using different techniques – their results are compared below.

3.3.1 Strava participation was combined with multiple features to predict attendance

In developing a model, various ‘features’ were combined to predict overall participation at the Great North Run. The features had different importance to the model; a statistical measurement of the strength of the correlation between each variable and the total people attending the event. Essentially, their predictive power. This explainability analysis tests the robustness of the models and ensures it is not making predictions on spurious features, which would indicate limited predictive power.

This analysis indicates that the binary feature of whether an event was a parkrun was among the important features to the model. Key demographic information was also important, such as the percentage of the local population in the 18-24 age range: a younger demographic potentially more likely to participate in outdoor sporting activities.

Weather had very little predictive power, suggesting propensity to participate in outdoor sporting events is not closely tied to weather conditions. This is an important finding given many of the training events (park runs and other half marathons) were not set-piece, ticketed events like the Great North Run.

Figure 4: Analysis of feature importance in predicting Great North Run attendance

3.3.2 Linear models better predicted attendance, but Ensemble models generalised more effectively

We developed two different scaling models. The first was a linear model - a mathematical model that assumes a linear relationship between the input (Strava count) features (weather, demographics, Strava and attendance data) and the output (overall attendance). It is one of the simplest forms of machine learning models. The second was an ensemble, combining multiple base models to improve performance and robustness compared to individual models.

Figure 5: True and predicted attendance values created by the linear model

Figure 6: True and predicted attendance values created by the ensemble model

Our results showed that the linear model was more accurate at predicting the Great North Run, estimating 55k attendance compared to the true figure of 60k (the focus was on runners only). However, it otherwise did not generalise effectively and was unsuccessful and predicting attendance at other events, exhibiting a higher average mean error, likely because the linear model could not identify the different relationships in the data to predict different types of events.

By contrast the ensemble model generalises better across a wider range of events, exhibiting a lower mean error. However, it was unable to accurately predict the Great North Run, estimating an attendance of 30k; 50% off the true figure for attendance. This result is likely due to ‘over-fitting’ to the training data given lack of non-park run training events in the training data. This can be seen by the fact both non-parkrun predictions in the test set were predicted as the same. For future development, training the ensemble model on additional data is most likely to produce an approach that will scale successfully to a wide range of events

Results of Activity Data Methodology Performance

Absolute Error: 4198 : The average difference between the actual runner attendance and the runner attendance our model predicted, measured in the same units as the data (in this case number of people). It shows, on average, how many people our predictions were off by. A smaller number means the model is better at making accurate predictions..

Percentage Error: 26% : The average difference between the actual and predicted runner attendance, expressed as a percentage of the actual attendance. This tells us the size of the error in relative terms and is helpful for understanding how big the error is compared to the event’s size. A lower percentage means the model is performing well.

Predicted Attendance: 25,000 (Actual 60,000): The number of runners the model thinks attended the event.

Question 5

4. Mobile App Data Methodology

Accepted Answer

There are a three key steps (covered in sections 4.1 - 4.3) in the process of measuring site and event visitation using mobile phone data:

Defining Boundaries
Extracting Mobile Phone Data
Scaling and Estimating Visits using Sampling Weighting

4.1 Defining Boundaries

Summary: Establishing the geographic boundaries of the site or event for data collection. Ensuring accurate delineation to minimize data spillover from surrounding areas.

Figure 7: Established geographic boundary for the Great North Run

Boundary Available: Where the site boundary is available through Open Street Maps, it is obtained by querying the Application Programming Interface.

Boundary Unavailable: Where the site boundary is not available from Open Street Maps, it is drawn manually using Geographic Information Systems software based on the Open Street Maps base layer.

4.2 Extracting Mobile Phone Data

Summary: Extracted geographic boundaries serve as the foundation for collecting geospatial mobile phone data, capturing GPS-recorded device locations within the defined area.

Figure 8: Capturing GPS-recorded device locations within the defined area

Collected Data for site of interest:

Unique Mobile Users – Number of distinct devices detected at the site.
Number of Mobile Visitor Days – Total device visits over a given period.
Spatial Patterns of Use – Sensitivity of GPS data within the site.
Visitor Catchment Areas – Geographic origins of visitors to the site.
Estimated Geo-Demographics – Socio-economic characteristics inferred from visitors’ home area.
National Mobile User Panel – Broader panel dataset used to weight and scale mobile visitation to reflect the total population.

4.3 Scaling and Estimating Visits using Sampling Weighting

Summary: The mobile phone population represents only a subset of total visitors to a site, so it is necessary to weight and scale the data to account for under- or over-representation across areas and produce population-level estimate of visitation.

Figure 9: Graphical depiction of sample weighting visitation estimates

Each mobile user is assigned a home area monthly. To estimate total visitation to the site, visitors are weighted by their home County to correct for over- or under-representation of mobile users across the UK. Visits are then scaled to a population-level estimate using the ratio of mobile users to the adult population in each County. This process is applied daily across all Counties with recorded visitors, and the final estimate is the aggregation across Counties.

4.4 Results of Mobile App Data Methodology

Absolute Error: 95K : The average difference between the actual runner attendance and the runner attendance our model predicted, measured in the same units as the data (in this case number of people). It shows, on average, how many people our predictions were off by. A smaller number means the model is better at making accurate predictions..

Percentage Error: 158% : The average difference between the actual and predicted runner attendance, expressed as a percentage of the actual attendance. This tells us the size of the error in relative terms and is helpful for understanding how big the error is compared to the event’s size. A lower percentage means the model is performing well..

Predicted Attendance: 155K (Actual 60K) : The number of runners the model thinks attended the event.

Estimates of visitation are for the entire day, not the specific event time which was around 4 hours, resulting in over-estimation of visitation. The weighting methodology developed is applied on a daily level.

4.5 Limitations

There are a few key limitations with this approach of leveraging location data to predict overall attendance at events:

Boundary Specification: Given the geospatial nature of the data, the process of defining the boundary of a site directly affects the extracted data and, consequently, the visitation estimates. Inaccurate or inconsistent boundary delineations may lead to underestimation or overestimation of visitor numbers, particularly for sites that lack clearly defined perimeters or those with complex spatial layouts.
Accuracy and Impact of Surrounding Areas: Mobile phone data inherently contains positional inaccuracies due to a range of factors. This can result in data from adjacent areas being incorrectly attributed to a site, particularly when the site is surrounded by roads, transit hubs, or densely populated urban infrastructure. Such spillover effects may introduce systematic biases in visitor estimates.
Time Period and Data Volume Constraints: The choice of the study period and the availability of data within specific time windows impact the reliability of visitor estimates. Shorter time periods may result in lower data volumes, leading to increased variability and reduced confidence in the estimates. This limitation is particularly pronounced in less frequently visited sites or during off-peak periods, where mobile data penetration may be lower. Additionally, fluctuations in data availability across different seasons, days of the week, or special event periods can introduce inconsistencies in trend analysis.
User Demographics and Sample Bias: While some research has shown that mobile phone data provides a good fit to the general population in terms of geographic and socio-demographic coverage, it still represents only a small percentage of the total population. When analysing smaller spaces or shorter time periods, the subset of available data is reduced further, increasing the potential for sample bias in the output.

Question 6

5. Social Media Methodology

Accepted Answer

There are 4 steps (covered in Sections 5.1 - 5.4) in the process of estimating attendance from Social Media Methodology:

Data Collection
Data Extraction
Data Processing & Classification
Scaling Model

5.1 Data Collection

Many social media companies have strict policies governing the use of their APIs. To ensure compliance with these policies the Pulsar platform was used to access social media posts at scale. Pulsar provides an API and managed wrap-around service that allows users to collect and analyse social media posts across multiple sources in compliance with the terms of service on the relevant platforms. This provided the primary data set for our social media analysis.

To protect user privacy, all data gathered across the platforms explored – Facebook, Instagram, X, Trip Advisor, Reddit - was anonymised by replacing usernames with randomly generated IDs.

To structure the data collection effectively, two main types of queries were employed on the Pulsar platform:

Live Queries: Designed for ongoing events, this was the type of query used for the British Museum, these queries collect data over a defined period. The primary data sources for these queries were Instagram (limited to live data only) and Facebook (limited data to the past 30 days only - set by Meta’s Terms and Conditions).

Historic Queries: Used for past, one-off events, such as the London Marathon, capturing data from one week before and after the event. The key data sources include Twitter, TripAdvisor, and Reddit. Having access to both live and historic data means modelling approaches using social media data are more flexible and can be used for a wider range of events.

5.2 Data Extraction

Searches for social media posts that can be used to predict attendance are constructed on the Pulsar platform using a Boolean search query defined by the user and then refined for each platform’s specific requirements.

Our methodology used large language models (LLMs) to generate queries, albeit with manual oversight. Additional adjustments were also made using event-specific details to improve accuracy and relevance. The Pulsar platform offers a Boolean generator support tool to support users with constructing the validated syntax.

Once a search is running on the Pulsar platform, whether live or historic, it is assigned a unique search ID. This ID, along with specified start and end dates, is then used within code notebooks to download the social media data that has been pulled into the platform.

5.3 Data Processing & Classification

Classification of Posts for Event Attendance: After downloading the data, each social media post must be classified to determine whether the author attended, plans to attend, or is likely to attend the event. This classification is performed using a Large Language Model, which analyses the content and context of each post. This project used a Llama model, which can be run locally on an inexpensive GPU.
LLM Query Construction: To accurately classify posts, a structured query is created for the LLM. This query includes:
- General Instruction: The LLM is prompted with a directive such as; “You are a helpful assistant for classifying posts about event attendance.”
- Event Details: The event name and description are provided. If necessary, an external source can be used to obtain a more detailed event description (e.g from the event website).
- Example Posts: To improve accuracy, the LLM is supplied with sample posts that illustrate different classifications. These examples help it distinguish between; users who attended the event, users who intend to attend the event, users who only engaged with the event remotely (e.g., watching on TV).
LLM Output and Post-Classification: The LLM analyses each post based on the provided query and assigns a classification. It outputs; “1” if the post indicates that the user attended the event, “0” if the post suggests the user did not attend. This classification allows for structured data analysis, providing insights into event participation trends based on social media activity.

5.4 Scaling Model

Purpose The primary objective of the scaling model is to estimate overall event attendance (restricted to runners only) from the number of people posting on social media about being at the event. Since social media posts only represent a subset of actual attendees, once the model has classified social media posts correctly into attendees and non-attendees, it needs to extrapolate this figure to make an estimate for all attendees at the event.

Figure 10: Flowchart depicting how visitation predictions are created

Process

Input: The model takes as input the number of “positive posts” (e.g. attending the event) classified by the LLM. “Positive posts” refer to social media posts made by individuals who are likely to have attended the event.
Extropolation: The scaling model extrapolates from the number of positive posts to estimate the total attendance.
Trained Model and Preprocessing: Firstly, the model will standardise the data collected from different social media platforms so they can be processed within the same model and remove outliers like posts with limited content (e.g. posts solely with emojis). Based on a series of features (a sample of the most important listed below) the model then predicts attendance values based on these structured inputs; Event type (Categorical feature indicating event type), Log_Pulsar_attendance (Logarithmically transformed attendance from the social media posts), Engagement rate (Prevalence of likes + shares + comments), Sentiment norm (Sentiment calculated with Pulsar sentiment analysis tool).
Ouput: The scaling model produces an estimate of event attendance; For historical events (e.g., concerts, rallies), the model provides a total attendance estimate based on data collected one week before to one week after the event; For ongoing events (e.g., British Museum), the model predicts weekly attendance for a two-week period.

Figure 11: Graph shows model prediction of attendance at a sample of test events used to train the model (y-axis) against the baseline attendance for events (x-axis). The closer to blue dots are to the dashed red-line, the more accurate the prediction.

Absolute Error: 9717 : The average difference between the actual runner attendance and the runner attendance our model predicted, measured in the same units as the data (in this case number of people). It shows, on average, how many people our predictions were off by. A smaller number means the model is better at making accurate predictions.
Percentage Error: 16.1% : The average difference between the actual and predicted runner attendance, expressed as a percentage of the actual attendance. This tells us the size of the error in relative terms and is helpful for understanding how big the error is compared to the event’s size. A lower percentage means the model is performing well.
Predicted Attendance: 69,717 (Actual 60,000) : The number of runners the model thinks attended the event.

5.6 Limitations

There are a few key limitations with this approach of leveraging social media data to predict overall attendance at events:

Query Design: People posting about events on social media will use different language or hashtags to describe the same event, meaning that queries may not collect all relevant social media posts. And additional complication is that queries which are too broad will collect too much irrelevant data (e.g. a query just with the word football will collect billions of posts), which a) can’t be processed and b) lowers performance of the modelling. The main mitigation is using specific, standardised and logical query terms based on a search through social media for how most people are referring to the event. More detail on how to construct these are in the toolkit accompanying this report.
Platform-Specific Search Methods: Different social media sources require different handling approaches.
- Instagram: Searches are limited to hashtags, restricting the possible breadth of data collection.
- Facebook: Keyword-based searches can be more restrictive than fully Boolean queries (e.g., lacking support for nested “AND’s” or “OR’s”).
LLM Prompting: The classification model used for this research project relies on a relatively small LLM hosted locally to ensure compliance with data governance regulations – in this case not sharing personal data with a third-party (i.e. the model provider). As a result, the locally hosted models used for this research are likely to be poorer performing than larger cloud-native LLMs, resulting in lower accuracy and less reliable estimates for attendance.
User Demographics and Social Media Habits: The likelihood of event attendees posting varies by demographic and social media platforms, meaning specific demographics may be over or under represented into the model estimates, resulting in both biased results and poorer model performance. Given access to all the demographic data on social media users and the demographics of people who attended this event are not available, there’s no way to correct for this bias within this approach.
Model drift: As social media habits change over time, the model will be prone to ‘drift’ where the performance degrades, and re-training the model is necessary to ensure continued good model performance.
Scaling Model and Small Development Dataset: Due to the comparatively small dataset used in this research (e.g. only tens of thousands of social media posts, compared to large datasets of millions of posts), the training of the scaling model was sensitive to outliers and risks overfitting (where the model learns the relationships within the training data but then can’t generalise these learnings to other events). This affected the model’s accuracy and generalisability to some ‘outlying’ events, for example…
- Events where ‘Virtual’ Attendance is Possible: If it is available to view on TV or online, there will be more posts with a significantly smaller proportion of people who attended.
- Events with Unusual Post Sentiment and Engagement: For example, the Women’s Euro 2022 Final Screenings will have had significantly more posts, with more positive sentiment since England won!
- Estimates of Ground Truth: The baseline attendance of events we used for training and evaluating our models were frequently estimates themselves, meaning the final assessment of our results may not be completely fair or accurate.
Data Source considerations and restrictions: Due to the platforms T&C’s and behaviour of users on social media platforms):
- Instagram: Only live data can be collected, restricting access to historical content and is confined to public content.
- Facebook: Data is limited to the past 30 days and is confined to public content, meaning no private groups can be captured.
- X: Changes in user behavior have shifted the platform’s use in recent years. Posts indicated a focus on news and commentary, rather than event-related posts, based on a review of collected posts.
- Reddit: Reddit was found to be a less reliable source for event-related posts, as users are were consistently less likely to share event details on the platform based on our assessment of posts used in the model training.

Question 7

6. Comparison of Methods

Accepted Answer

Below is an assessment of the performance of each methodology in its specific application to the Giant’s Causeway.

6.1 Activity Data Model (Recommended): 26% error compared to baseline

Explanation

The performance of the activity data model can be explained by two primary factors:

The models trained to ‘scale’ the count from Strava data to an overall prediction were trained on a relatively small and insufficiently diverse training data set, comprised mainly of smaller events, leading the model to underestimate attendance;
Not all runners use Strava, and those who do may not always record their runs, leading to underrepresentation in the dataset. In the case of the Great North Run, an official AJ-Bell sponsored tracking app was widely used;

Precaution for use

For future development, training the ensemble model on additional data is most likely to produce an approach that scales to predict multiple events.
Key precautions include that the methodology is highly specialised to counting attendees at running or cycling races and will not generalise other event types.
Improving the performance of the model for larger events like the Great North Run, requires training the scaling model on a larger dataset that includes data from events of similar scale and runner demographics to the Great North Run e.g. large events like the London Marathon.

Stronger for:

Leverages millions of publicly shared runs from Strava in an ethical way to predict attendance at sporting events, with strong results for running and cycling events.
Low-cost approach set-up so addition of new training data will improve generalisability over time to more sporting events.
Strava Metro platform makes data easily accessible.

Weaker for:

Optimisation for sporting events means performance is weak for cultural events.
Significant time delays are experienced when downloading training data out of Strava Metro.
Over-representation of young and active populations in training data means model doesn’t generalise well to events, where these demographics are not represented.
No demographic insight about attendees.

6.2 Mobile App Model (Not Recommended): 158% error compared to baseline

Explanation

This approach performed poorly for the Great North Run for two reasons:

The minimum data collection period for this methodology is 24 hours, but the race lasted only a few hours. As a result, the model likely counted people who were in the area before and after but didn’t attend the event;
Additionally, given the event draws large spectator crowds, many people watching the race, but not directly participating in it, would have been included in the count – even with the methodology drawing tightly defined boundaries around the racecourse.

Precaution for use

Location data is best suited for events spanning longer time periods, such as counting attendance at a location over a year. For short events, alternative data should be explored as mobile app location data can lack granularity.
Separately to the model’s performance, there are some ethical concerns associated with the fact that users of these mobile apps do not actively opt in to their data being used to track their attendance at events, raising privacy concerns.

Stronger for:

Accurate for long-running recurring events that repeat over the course of weeks, months or years e.g. exhibits or parks etc.
Method is low-cost and straightforward to deliver with the right expertise and can be compared to known population distributions for specific sites or events, helping to uncover demographic info and reduce bias.

Weaker for:

Huq data is generally less suitable for short lived events (e.g. A few hours in duration). It is better suited for recurring events and locations with fixed boundaries.
Mobile phone location data is subject to some ethical concerns around use. While this data is legal to collect and for providers like Huq to license, there are some concerns about whether individuals can be said have given ‘informed consent’ for their data to be collected.

Explanation

This approach performed well for the Great North Run for the following reasons:

Engagement was highly localised and primarily attracted social media activity from those who were physically present e.g. update posts from ‘finishers’;
Many participants and spectators tended to use specific hashtags (e.g., #GNR2022), making it easier for the model to identify relevant posts;
Participants the Great North Run were more likely to use location-tagged posts, making it easier to match social media activity with actual attendance.

Precaution for use

The relatively strong performance of the model here is likely a result of the comparatively low online engagement with the event, in comparison with physical attendees.
For other events (e.g., more famous races like the London Marathon), online engagement may not correlate as well with physical attendance.
High TV coverage or virtual participation might see an inflated number of mentions without those people being present, making it harder for the scaling model to produce accurate results.

Stronger for:

Provides a low-cost and straightforward way of leveraging social media data to predict attendance. Pulsar’s managed-API service is intuitive and reduces friction of managing platform-level terms of service obligations.
More accurate for medium-large events where events have a sufficiently large in person attendance for a significant number of posts to be made confirming in person attendance.

Weaker for:

Accuracy is reduced where events have a large television or online audience who post online about the event.
There are some ethical concerns about using social media data in this way. While compliant with platform-level terms of service, there is not clearly ‘informed consent’ for this use case.
Risks with increased privacy walls means the approach may not be extendable in the future.

Cookies on GOV.UK

Executive Summary

1. Event Overview

2. Methods

3. Activity Data Methodology

3.1 Count Strava Attendance

3.2 Train model to predict overall attendance

3.3 Predict overall attendance

3.3.1 Strava participation was combined with multiple features to predict attendance

3.3.2 Linear models better predicted attendance, but Ensemble models generalised more effectively

Results of Activity Data Methodology Performance

4. Mobile App Data Methodology

4.1 Defining Boundaries

4.2 Extracting Mobile Phone Data

4.3 Scaling and Estimating Visits using Sampling Weighting

4.4 Results of Mobile App Data Methodology

4.5 Limitations

5. Social Media Methodology

5.1 Data Collection

5.2 Data Extraction

5.3 Data Processing & Classification

5.4 Scaling Model

Process

5.5 Results of Social Media Data Methodology

5.6 Limitations

6. Comparison of Methods

6.1 Activity Data Model (Recommended): 26% error compared to baseline

Explanation

Precaution for use

Stronger for:

Weaker for:

6.2 Mobile App Model (Not Recommended): 158% error compared to baseline

Explanation

Precaution for use

Stronger for:

Weaker for:

6.3 Social Media Data (Recommended): 16.1% error compared to baseline

Explanation

Precaution for use

Stronger for:

Weaker for:

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK