Capturing engagement numbers - strand 1 report

Question 1

Executive Summary

Accepted Answer

Obtaining accurate estimates of attendance at cultural or sporting events and locations is vital for the Department for Culture, Media and Sport (DCMS), allowing effective event planning, robust evaluation and value for money assessments for hosting or facilitating these events.

While ticketing data can give a good understanding of engagement with ticketed events, measuring engagement at un-ticketed events is challenging. To date, insight has largely come from surveys and although the data provided is of good quality, provides valuable demographic insight and a replicable approach, they come with some limitations. For example, they are limited in their ability to comprehensively measure local participation at specific events or spaces.

Given these potential limitations, and that even ticket sales or traditional crowd counting methods may not be an accurate reflection of attendance, DCMS wanted to explore data-driven methods for capturing engagement. This report details Strand 1 of the ‘Capturing Engagement Numbers’ research programme, covering work to develop a comparative framework for innovative methods of measuring participation, prior to more comprehensive application of those methods in Strand 2 of the project. The Strand 1 development was split into a Breadth phase, in which data sources were identified and evaluated for suitability for experimentation, and a Depth phase in which modelling approaches would be combined with the data source to develop a measurement method.

Breadth phase

A comprehensive review of potential data sources was undertaken through an evidence synthesis, interviews with subject matter experts, and combined with the prior knowledge and experience of the research team. Over 60 potential data sources were identified across the following categories:

Transport and activity data used to track people using personal (e.g. car, cycling, walking) and public transport (e.g. trains, bus)
Social media data used by people to report or comment upon real world events
Mobile app data capturing individual-level GPS data obtained from the use of mobile applications on GPS-enabled devices
Deployable sensing technologies data including any data collected by sensors deployed to an event or space in a bespoke manner to monitor attendance and audience size (e.g. camera footage, Wi-Fi, or radio signals)
Event and Space data such as census data or National Travel Survey data that will be unable to measure attendance at events in isolation but could potentially be used in combination with other methods to provide audience demographics

A shortlist of data sources within each category was drawn up for further exploration in the Depth phase using the following prioritisation criteria:

Granularity & frequency - to assess whether it is feasible for robust estimates to be calculated from the data source
Access - to assess the likelihood that the data source could be available for this work (e.g. whether it could be shared with the research team for experimentation in this project, or with DCMS in the future). The accessibility assessment included a judgment on affordability of the data to the project team and the overall cost-benefit balance of a data source
Attendance and participation - to understand for what types of events and activities the data source could provide estimates, and what additional information it contains (e.g. demographics)

Depth phase

Four data sources were identified as candidates for further experimentation, as described in Table.i. For each one, a case study was undertaken to combine the data source with modelling techniques to develop a method of estimating attendance at a particular event/activity that aligned with baseline data.

Table.i : List of Data Sources used for experimentation, with relevant modelling techniques and activity/events applied.

Data source (category)	Event / Activity	Baseline data	Modelling techniques
Huq (Mobile app data)	British Museum	Published attendance figures	Sample weighting
Strava (Transport and activity data)	Edinburgh Parkruns	Parkrun data at two Edinburgh events	XGBoost – a high-performance decision tree model
Pulsar (Social media data)	Farnborough International Airshow Scotland vs New Zealand rugby Lewes’ Bonfire Night	Published attendance data per event	Several models were tested, with Zero Shot (a Large Language Model variant) performing best in general
Aerial photography (Deployable sensing technology data)	Bradford City of Culture 2025 The Giant’s Causeway	Published attendance figures or observed footfall counts.	Object detection and crowd density modelling

Key findings

How well a method performs at estimating attendance can depend on the characteristics of the event: For example, the method utilising Mobile App data (via Huq) had optimal performance over a longer period of time at a fixed boundary location (e.g. a museum, or park). This is due to the methodology requiring boundaries to be drawn, and data being collected when an individual passes within that boundary. So, while data collection is not necessarily limited to fixed boundary locations, in the context of measuring attendance at an event or space, it is optimal if the drawn boundary and the fixed boundary location are aligned. As there will be a relatively limited number of relevant mobile app users present in a space at any given time, a longer event period allows more time for a sufficient number of attendees to be captured to provide a robust estimate. In contrast, it is potentially more likely that a higher proportion of attendees at a running or cycling event will have Strava enabled, which enables an estimate of attendance over a shorter time period. Data collection is not necessarily limited to fixed boundary locations, however. Data can also be used for broader geographic areas or dynamic locations, depending on the specific use case and data collection methods.
Access to baseline data: Baseline data of sufficient volume, granularity, variety and detail (e.g. covering different event characteristics) is essential to be able to iterate and improve the method’s ability to estimate attendance more accurately. Example sources of baseline data used during Strand 1 include published museum attendance numbers or Parkrun participation numbers. During Strand 2 of the project, national baseline data from the Association of Leading Visitor Attractions (ALVA) is used to build and refine estimates.
Access to data sources: Significant time was spent ensuring compliance with data protection requirements and ethical standards, including how the data source is collected at the front end, how it will be accessed, and how it will be used. Other common barriers to access were upfront cost for subscription models, and complications with setting up social media business accounts necessary for accessing certain types of social media data.
Demographic information: Some data sources can provide an estimate of home location (Huq and Pulsar), which in turn could be used for some demographic analysis by linking to static data sources, such as census data.
Exclusions: Some demographic profiles will not be represented at all, or be underrepresented by a data source, for example, children and tourists are under-represented in Mobile App data due to their relative lack of phones or absence from UK mobile networks.

In conclusion, the Strand 1 research delivered positive results in terms of providing estimates that aligned with baseline data. The research also surfaced several barriers to developing successful methods of estimation. Some of these barriers could be overcome with more time or investment, while some would remain or be subject to how platforms enable or restrict access to data in the future. Each of the four methods were progressed to Strand 2 for broader application.

Question 2

1.Introduction

Accepted Answer

1.1 Context and objectives

Verian was commissioned by the Department for Culture, Media and Sport (DCMS) to lead an independent research and development (R\&D) study into new methods to measure attendance across its sectors. The project team included social researchers from Verian, academics from the University of Glasgow’s School of Computer Science and Urban Big Data Centre, and data scientists from Faculty.AI, a technology company with expertise in accessing and working with the categories of data of use to this project.

It is acknowledged that robust measurement of audience engagement is challenging, particularly when trying to achieve a measure that is both accurate and cost-effective. For this project’s purposes, ‘engagement’ refers to both active participation or physical attendance at an event or space. While ticketing data can give a good understanding of engagement with ticketed events, unticketed events present a particular challenge, especially at events or locations where points of entry are hard to measure as this limits the potential to manually count attendees over a given time. Currently, the data and insights in this area are mostly provided by surveys, which although provide strong data quality, demographic information and a replicable and scalable approach, can be costly and come with their own challenges and limitations. The DCMS Participation Survey, for example, provides valuable information for ground-truthing, demographic insights and contextual richness at local authority level. However, it is limited in its ability to comprehensively measure hyper-local participation at specific events or spaces, for example, in sports or arts/drama clubs. Other large surveys like Sport England’s Active Lives rely on recall of activity and are subject to potential response or social-desirability biases. Whilst these surveys provide a solid understanding of attendance at a national level, there are known challenges to how far surveys can provide the more granular level of insight that DCMS would benefit from to set and meet its targets.

This research has focused on the extent to which that insight can be provided by alternative means, specifically combining non-survey data with statistical modelling techniques to provide a useful measure of engagement at unticketed events (for example, UK City of Culture, museums) and activities outside the home (for example, participating in sporting activities in a park). The research sought to include a measure of the frequency and duration of the engagement.

The categories of data initially scoped as of interest include mobility data, mobile data geo-fencing, activity trackers, image/film assisted density analysis, footfall counting techniques and the application of AI and machine learning. There is also an awareness of the potential limitations and biases of each data source and modelling technique.

DCMS is interested in research that develops and tests a robust framework which can be used to overcome some of these challenges, to engage with data sources and modelling techniques and to give a more accurate measurement of engagement with culture and sport. The deeper understanding gained by these approaches, complementing those acquired from analysis of survey data, will further support DCMS in achieving its aims.

1.2 Research objectives and summary of approach

The study has been divided into two strands, each with an overall research objective. To address each research objective methodically, each strand was divided into two phases, as described below.

Strand 1: To develop a comparative framework for different measurement methods.

Breadth phase: To identify and analyse a longlist of data sources and modelling techniques to help DCMS understand all relevant methods to this research.
Depth phase: To conduct an in-depth analysis of methods shortlisted from the Breadth phase to recommend the most viable methods for Strand 2.

Strand 2: To develop case study examples to test the application of different measuring methods, using mixed methods where appropriate.

Design and setup: To define the events data and baselines required for a rigorous methodology to be in place before testing the recommended methods.
Test and reporting: To test each shortlisted method against each shortlisted event with results analysed according to the defined methodology.

1.3 Objectives of the report

This report covers Strand 1 of the research programme. It will describe the staged approach of scoping, assessing and filtering data sources to determine which should be prioritised for more comprehensive analysis in the Depth phase.

Section 2 describes the approach to researching the longlist of data sources
Section 3 describes prioritisation criteria to decide whether to carry forward a data source to the next stage
Section 4 provides an overview of potential modelling techniques
Section 5 onwards is grouped by data category (e.g. transport, mobile, social media), within which there is an analysis of specific data sources, potential modelling techniques and sample weighting, a list of key findings, and clear recommendations for the Depth phase

Question 3

2. Approach to researching data sources

Accepted Answer

The breadth phase identified and analysed a longlist of data sources, which could be used to help capture engagement numbers at unticketed events. Data sources were identified via an evidence synthesis bringing together relevant literature identified by the project team together with evidence from discussions with a range of governmental and non-governmental stakeholders. Literature consulted in the evidence synthesis included:

Academic literature guided by the knowledge and experience of academic experts at the University of Glasgow
Industry reports, which also served to highlight new or upcoming data sources becoming available
Government reports of work on similar research projects
A review of new technologies relevant to this research, both now and in the future

The list of consulted literature is available in appendix 3.

The team engaged with subject matter experts in academia, government and industry, including established contacts and additional contacts identified through this phase of work.

Based on the project team’s experience of similar research projects, the initial range of data sources were categorised as:

Transport and activity data (e.g. parking data, traffic monitoring, passenger numbers on trains)
Social media data (from third parties or through social media companies’ application programming interfaces (commonly referred to as API’s)
Mobile data / geo-location data (e.g. eSIMS, Wi-Fi connections, Strava API)
Deployable sensing data (e.g. aerial photography, wearables, radio frequency identification tags (RFIDs))
Event and space data (e.g. census data, National Travel Survey data, Ordnance Survey’s Point of Interest dataset)

Question 4

3. Prioritisation approach and criteria

Accepted Answer

To prioritise specific data sources that should be progressed beyond the Breadth phase for further testing, each data source identified in our evidence review was assessed against two stages of prioritisation criteria. These were designed to ensure potential novel data sources would deliver against the aims of the Capturing Engagement Numbers Programme, i.e. to develop new and robust engagement measurement methods that can be deployed in real scenarios.

3.1 Stage 1: Assessment of granularity & frequency, and access

Figure 3.1 sets out the prioritisation matrix for stage 1, in which datasets would be prioritised if they qualified for inclusion in quadrants 3 or 4. These datasets were judged to have promising levels of detail and prospects for accessibility, which mean they were likely to be ‘greenlit’ i.e. progressed to the Depth phase for further testing.

Figure 3.1: Breadth Phase Prioritisation Matrix

In order to prioritise datasets in a systematic and transparent way, the project team agreed to a shared understanding of the criteria to apply:

Granularity and frequency

Granularity refers to the degree of specificity of each piece of data. Datasets for use in this project should be detailed enough for use in estimating attendance. Data with ‘fine granularity’ which provides in-depth detail such as information at a postcode level was prioritised in preference to datasets which are comparatively ‘coarse-grained’ such as those with city-level only information.

Frequency in this context refers to how often the data is captured. High frequency datasets such as those at a daily level, as opposed to monthly, were prioritised as high frequency data would be more useful for accurately estimating attendance. This is particularly true for efforts to estimate attendance at shorter or one-off events. Data collected with a high granularity and frequency will ultimately have a larger number of use cases than those that are less detailed.

Granularity and frequency can vary relative to different events. For example, some events may have more social media posts about them, while for an event held in a remote location there may be no mobile app data available. Given the range of variables which could impact the accuracy of estimates, producing set criteria for what constitutes ‘granular or frequent’ enough is not feasible. Broadly speaking, more granular and more frequently collected data will lead to more accurate estimates, because models will have more precise data to ‘learn’ from. Given the range of variables which could impact the accuracy of estimates, producing set criteria for what constitutes ‘granular or frequent’ enough is not feasible. However, when analysing modelling results in Strand 2 of the study it will become more apparent where these criteria fall for different methodologies.

Data accessibility

Data sources are assessed for accessibility both in a practical sense as well as a financial one. Data sources that are practically difficult to access, such as those that require an application for API access through a difficult to access system will not necessarily be disqualified but will be lower priority than those that are straightforward to access. The same is true of cost, where a more nuanced cost-benefit judgement was taken, although in practice more affordable data sources are likely to be prioritised for further testing given the inevitably limited data budgets available to R\&D projects of this nature. Data sources that are assessed as being non-GDPR compliant or not meeting ethical standards such as having unclear user consent processes or unclear data origin information are deemed to be inaccessible and excluded from assessment.

When applying these criteria, we are assessing the data sources on the basis for their usability in this project. It may be the case that there are some data sources that are judged to be inaccessible on a practical or cost basis for this project but may still prove useful to DCMS in the longer-term for capturing engagement. This is particularly true for cost assessments, where a long-term subscription is not feasible for a relatively short-term R&D project but could be within the means of the Department more generally. Where this is the case, it is recorded in the prioritisation findings in the annexed data source table.

3.2 Stage 2: Attendance and participation estimation feasibility

Those datasets that passed Stage 1 assessment were then subject to a qualitative assessment of how useful they could be in estimating attendance, and how feasible utilising them in this way might be. This included reviewing evidence gathered by synthesising the literature around existing use-cases for the data in question, analysis of any sample data or metadata available and any other details provided by data providers. This evidence synthesis allowed the team to make a collaborative judgement on whether a data source had sufficiently detailed information and was available to access. These collaborative judgements were based on the experience of experts in the consortium on working with a range of data sources but also making a practical decision too. For instance, datasets which were updated only annually or had very sparse information were not viable for effectively capturing engagement for the vast majority of events or spaces. Similarly, datasets only available through a lengthy procurement process or for which only long-term subscription packages are offered were also deemed not viable.

Each data source was assessed using the evidence generated by this process to determine whether it could be used either to estimate audiences at individual events or numbers of people participating in sport and culture activities over an extended period. It was further assessed as to whether it could provide demographic information on a given event or space’s attendees. This demographic information may include age, gender, socioeconomic status, among other characteristics. While a data source was not excluded if it failed to contain demographic information, those that did were given a higher priority.

The datasets that fell into quadrant 3 or 4 from figure 3.1 and assessed as being feasible for this project’s purposes were ‘greenlit’ and prioritised for potential further in-depth testing. Figure 3.2 below provides an illustrative example representing the position of a selection of assessed data sources on the prioritisation matrix.

The full catalogue of data sources assessed, and the outcome of the prioritisation approach detailed in this section for each, is provided in detail as an annex (Annex 1).

Figure 3.2: Illustrative example of a selection of data sources positioned on the Breadth Phase Prioritisation Matrix.

Table 3.2: Key for Figure 3.2

Matrix Outcome Key
H	Huq	CKD	CKDelta
P	Pulsar	FB	Facebook
S	Strava	AS	Airsage
NH	National Highways WebTRIS	TfL	Transport for London
G	Geolytix	F	Factori
AxC	ActiveXChange	X	X (formerly known as Twitter)
UC	Unacast	T	Tamaco
EA	Echo Analytics	TT	TomTom
C	Cuebiq

Question 5

4. Modelling approaches

Accepted Answer

As well as providing a thorough assessment of data categories and sources, the Breadth phase also produced a synthesis of modelling approaches generally used in the current literature for capturing engagement using novel datasets. Following this overview, specific approaches have been implemented on a case-by-case basis to individual data sources with full details available in sections 5-9 or in the relevant case study annex.

4.1 Regression

Regression models offer an approach for estimating attendance at unticketed events and cultural spaces by using multiple data sources, including transport data and weather data, for example. Other examples of possible data sources for use in regression models includes local demographics such as age distribution, and local economic indicators, such as average income. Based on a series of input data points, these are analytical tools that discern patterns and relationships between variables and use them to make informed value predictions.

In this project, baseline visitation data would be integrated with a variety of other data sources to predict the number of visitors to a space or event. This methodology is particularly beneficial in scenarios where conventional counting methods are either impractical or unavailable, making it a valuable tool in data-scarce environments. In such cases, a sample of known visitation to events or spaces can be used to build a model which helps inform something about out-of-sample events. Regression models will be evaluated for use in conjunction with those data sources prioritised for further exploration in the depth phase.

Models can range in complexity from simple linear regression, which fits visitation and participation figures to a linear relationship with the explanatory variables, to more complex machine learning approaches such as random forests, in which multiple decision models are independently created and then combined to yield one ensemble model. The choice of model will be largely dependent on the available data. For example, more complex models like random forests perform well with large amounts of data (tens of thousands of records) but struggle with small datasets (1000 or fewer records); whereas simpler models like Support Vector Machine (SVM) regression work well with little data.

To generate these models requires the following core components are important to consider:

Baseline visitation data: The foundation of these models is a set of established visitation data to a range of events or spaces. This data acts as a benchmark and point of calibration, aiding in contextualising and interpreting new data sources and attributes associated with specific events or locations.

Site and context-specific information: Additional data related to the event or site, including weather conditions, accessibility, local population density, and specific event characteristics, can further refine the model’s accuracy.

Transport data sources: Data from transportation-related sources contribute an additional contextual layer, especially in assessing mobility patterns for certain types of travel.

Other types of data: Data like social media and mobile phone app data can also be used as part of a regression model. More details on the implementation of these sources of data will follow in sections 5 - 9.

It is common practice to combine multiple data sources to derive insights surrounding visitation. Indeed, these models have shown significant utility in contexts like natural site visitation (outdoor spaces such as natural parks), where data scarcity and logistical challenges limit traditional survey methods. Both Ghermandi and Sinclair (2019) and Heikinheimo et al (2020) for example, highlight the potential value in the integration of multiple data sources in predicting attendance, including both social media and location data. Notably, Joshi et al (2023) used transport data sourced from Strava Metro to combine weather and static demographic data which enabled them to create a regression model that can estimate monthly visitor counts to U.K. green spaces.

One of the main challenges in this methodology lies in selecting compatible data sources that could be implemented into the model. To be combined effectively, these data sources must have the same temporal and spatial coverage of an event or be adaptable enough to be processed to align in terms of their coverage. In practical terms, this means data timing should align for the same location, for example, monthly frequency of collection. Moreover, the integration of multiple data sources into one complex regression model must be methodically executed to avoid multicollinearity, where two or more data sources are highly correlated, ensuring that each variable independently contributes to the model to generate reliable statistical inferences. Regulation methodologies, like Ridge or Lasso regression, make models less sensitive to multicollinearity whilst remaining robust. Therefore, further evaluation is required to combine multiple data sources. This includes assessing compatibility, checking for multicollinearity and selecting features using domain knowledge and statistical tests. Model selection often starts with simpler models, such as linear regression, applying regularisation techniques if multicollinearity is a concern and evaluating regression models using cross-validation to assess model performance and avoid over-fitting.

Baseline data quality is an additional challenge that will directly affect the model’s performance. If a data source is not available for a given event, a model built on that data source will not be able to be applied to that event, even if the rest of the data is available. For example, a model trained solely on ticketed event data may not generalise well to un-ticketed events, as the audience characteristics of ticketed and un-ticketed events may vary. Moreover, data availability (or lack thereof) for events may limit the usefulness and application of such models. Indeed, regression models would need a range of data sources to be available to calculate attendance. A possible solution could consist of building multiple models, each using a different number of features to account for this variability in data availability.

While baseline data is useful for modelling regular events, additional considerations and adjustments are needed to accurately predict attendance for one-off or special events. This might involve incorporating event-specific data or using different modelling techniques to account for the unique characteristics of these events. One approach is hybrid modelling, combining different modelling techniques to leverage the strengths of each method. For example, using machine learning models to capture complex patterns and time series analysis for trend forecasting.

Overall, building regression models should be prioritised, as doing so could provide attendance estimates for events that would either not be possible using a single data source, or would at least be more accurate. However, the decision to proceed should be contingent on the availability of high-quality baseline visitation data, which will influence the resolution of the results. This is especially true for estimating visitation for narrow time windows.

4.2 Sample weighting

Sample weighting is an approach to correct biases present in data by scaling up or down demographics in the sample to be in line with the demographic proportions in the population. These biases are caused by sample error, the discrepancy between the characteristics of a sample and that of the entire population, i.e. all people of interest. Sample error stems from the fact that not all individuals are included in a sample, particularly when the data involves choices influenced by demographic factors. (Hays et al 2015; Solon et al 2015).

One of the most popular methods for sample weighting is raking, also known as RIM weighting, which iteratively adjusts the weights for each characteristic until they align with the proportion found in the population. For example, if the population of study is 52% male and 10% aged 18-24 years but the survey sample is 55% male and 15% 18-24 years, the weights would be adjusted first for gender to match the population distribution and then the weights would be adjusted for age.

For any sample weighting method, high-quality population demographic data for different types of events, participation in spaces, and of a city’s population, like age and gender, will be necessary. This population demographic data may be obtained from the UK censuses. Between censuses, the Office for National Statistics (ONS) produces annual mid-year population estimates (MYEs), which update the population figures based on births, deaths, and migration. These estimates provide detailed demographic information by age, sex, and local authority, making them suitable for aligning sample data with population demographics.

Previously conducted government Official Statistics surveys, such as DfT’s National Travel Survey, DCMS’ Participation Survey, Sport England’s Active Lives, and Natural England’s People and Nature Surveys, can be used to determine typical demographic data for different types of events and transport. Some limited data is available on international visitors, such as the ONS International Passenger Survey (IPS), although this may be unsuitable for sample weighting at a population level. If suitable data is available this can be used to create separate weights or proportionately adjust weights based on the international visitor population and the total population.

Therefore, sample weighting should be used when using data with sampling bias, considering each dataset and model on a case-by-case basis.

4.3 Breadth phase outcomes

Table 4.3: Summary of findings from the Breadth Phase

Data Category	Data Source to Progress	Modelling Approach to Progress	Prioritisation	Depth phase testing conducted
Transport Data	Strava Metro TomTom	Regression Modelling (specific methods will vary on the characteristics of the data available).	Both recommended data sources offer wide coverage and can be used to provide information on both participation and demographics.	Regression modelling using an XGBoost predictive model, using Strava Metro data for Edinburgh Parkruns.
Social Media Data	Pulsar	Adjusted Classify and Count (ACC) quantification method using updated classifiers from UAC.	Pulsar prioritised as a data source as they provide aggregated social media data access to several popular platforms. Identification of social media posts by keywords tends to result in the gross over-estimation of attendance. By comparison, application of the User Attendance Classification (UAC) methods proposed by de Lira et al. (2019) enables a significantly more accurate estimation of attendance to be calculated through identification and separation of those who posted on social media and attended from those who posted and did not attend. This methodology can offer insights into the hometown of some users (by examining content and metadata of social media posts), to infer demographic information such as age and gender, and to understand the geographic distribution of event attendees.	Adjusted Classify and Count (ACC) tested using X (formerly known as Twitter) data accessed through the Pulsar dashboard. This approach was tested for Lewes Bonfire Night, the Farnborough International Airshow and Scotland international Rugby fixtures at Murrayfield to compare effectiveness across both sporting and non-sporting events and those with a high or low anticipated virtual-only engagement.
Mobile App Data	Huq ActiveXChange Echo Analytics UniCast CKDelta	Regression and machine learning techniques where counts of mobile phone data, mobile phone users will be utilised as explanatory variables. For Huq and CKDelta, which are expected to provide visitor origin information, there are plans to weight to adjust for biases.	Prioritised data sources are aggregated to avoid privacy concerns. The most promising use of these data sources is likely to be in the study of fixed cultural and sports sites over extended periods, such as weeks, months, or even a year. It is advisable to aggregate data over larger time windows to effectively capture activity variations at different times of the day or week. Such aggregation will enhance the reliability and applicability of the insights and minimise potential biases. The recommendation is to combine these data sources with additional methods and data types. This integration will help to analyse the reach and demographics of event audiences which would otherwise not be possible from crowd counting alone	Huq data has been weighted and scaled to produce attendance estimates for the British Museum as a test location. The promising nature of this testing means further experimentation in Strand 2 will look at developing a regression machine learning approach.
Deployable-Sensing Data	Examples would include radio frequency identification tags that can sense radio signals.	Despite complexity in obtaining crowd images, image-based approaches using machine learning models to count crowds should be pursued in the depth phase, as initial testing showed open-source models delivered promising accuracy.	Radio-based methodologies (including radio signal tracking) are not recommended for further exploration as they have issues with deployment and are not considered accurate enough to estimate audiences. Radio signals can be used to estimate the presence of devices within certain areas, assuming those devices are tuned into the broadcast, however, they lack precision to estimate attendance, and set-up is often complex and expensive.	Given the complexities and challenges highlighted in obtaining crowd imagery, image-based approaches using machine learning models will continue to be prioritised for testing in Strand 2 of the study, where they can be used alongside methods already developed for other data sources in Strand 1.
Event and Space Data	See greenlit data sources for full record. Examples include Census data and OpenStreetMap.	To be used in conjunction with other data sources and modelling approaches.	Static data sources will not be able to count attendance, engagement or participation by themselves, as they cannot track a space’s attendees over time. However, they can be used with other data sources to provide more in-depth information	Individual sources used alongside selected test events where appropriate. Our approach with mobile app data, for example, uses OpenStreetMap to extract locations for analysis in conjunction with Huq data.

Question 6

5. Mobile app data

Accepted Answer

5.1 Breadth phase overview

Mobile phone app data predominantly exploits individual-level GPS data obtained from the usage of mobile applications on GPS-enabled devices. The data are routinely gathered by third party companies through Software Development Kits (SDKs), spanning a broad spectrum of mobile phone applications, including those for navigation, health, shopping, and weather, all under the umbrella of informed consent. The manner of location recording is contingent on the user’s consent per app, which might be continuous in the background or only when the app is active. The data generally offer the point locations of the device with high accuracy (up to metres), dependent on the most precise sensor available at the time of recording, be it GPS, Bluetooth, cellular tower, or Wi-Fi.

Whilst raw mobile phone data offers a high level of versatility, ethical concerns relating to data protection and the general reluctance of companies to provide such data necessitate that access to completely raw data is virtually impossible. Consequently, any research that utilises mobile data works with much less granulated forms, supplied by companies in a way that adheres to their privacy regulations. While this reduces the versatility of the mobile data, overall, it remains no less desirable as one part in a potential method of measurement, because aggregation over large time windows enhances the reliability and applicability of the insights but also minimises potential biases.

The raw location data is collected and harnessed by third party companies for commercial purposes, translating the data into analytical products and services. These products generally offer information on visitor behaviour and patterns across domains such as retail, transport, and urban planning. The utility of this type of location data may extend beyond commercial use, with the potential to inform applied research such as estimating engagement with natural and cultural sites. Merrill et al (2020) and Mears et al (2021) for example, have utilised location data to attempt to estimate green or rural space usage. Other examples in published literature leverage mobile phone data to estimate spatial and temporal patterns in the use of urban green space (Heikinheimo et al.,2020; Mears et al., 2021; Sinclair et al., 2023b) and in combination with ground truth visitation data to model recreation and estimate visitation to water-based natural sites (Merrill et al., 2020). This type of data has only recently become available and research into its use is in its infancy.

5.2 Identified modelling approaches

Regression models

In the context of mobile phone app data, regression models are used to understand and predict or forecast engagement with fixed sites of interest or events, utilising the app data as an explanatory variable as a proxy for visitation. Regression is particularly applicable in this context due to its ability to handle wide ranging and often complex multi-dimensional datasets such as mobile phone app data. When integrated with baseline visitation data and other digital and traditional data sources like transport data, social media data, and site or context-specific information (e.g., population density, accessibility, weather conditions), regression models using mobile app data can help to understand known visitation patterns. Such models can then be applied to estimate past attendance or predict future attendance at events where visitation data is missing.

Sample weighting

In the context of mobile phone app data, which can be seen as a panel dataset covering a known percentage of the entire population, sample weighting involves scaling demographic proportions geographically within the data to align with those in the wider population. This approach aims to account for any biases in the data, ensuring that the results are more representative of the target population. This is especially crucial given the dynamic nature of mobile app data, where the volume of users and the characteristics of the sample can change over time, necessitating the weighting process to change too.

Applying sample weighting to mobile phone app data can be challenging, as the process of designing weights is more complex than in traditional survey data. Mobile app data, unlike surveys explicitly designed for weighting, undergoes variable collection processes, complicating the weighting task. Variable collects means that the data is not collected in a uniform way, as would be the case with a survey. Different applications and permissions on mobile devices leads to data generation which can be inconsistent. This can also be called measurement bias. Although companies typically use sample weighting to refine their products and services, research in this area is limited. Examples available include Zhang et al (2023), who successfully implemented basic sample weighting on mobile phone app data at the census output area level to enhance population level estimates of mobility. In contrast, Mollay et al (2021) applied more nuanced weighting techniques akin to survey data, but this was dependent on the availability of detailed socio-demographic information for the sample of mobile phone users, which is not typically accessible outside of targeted research.

Despite these challenges, a UK study by Sinclair et al (2023a), suggests that mobile phone app data can reflect a socio-demographic mix similar to that obtained from designed household surveys, particularly in transport research. Sinclair et al (2023a) designed a methodology to compare the socio-demographics of the mobile phone population (across two datasets) with the known adult population for Glasgow city-region and the sample of the Scottish Household Survey distributed there. The results of both showed that the mobile phone data capture a population not dissimilar to the actual population and closer to that than the designed sample used in the Scottish Household Survey (which is designed to be weighted). This indicates the feasibility and potential value of sample weighting in mobile phone app data, potentially offering significant insights without the need for extensive weighting adjustments, which is something Mollay et al (2021) also found.

The application of sample weighting to mobile phone app data should be considered a medium priority, after regression and other potential machine learning methods. While it has the potential to significantly enhance the representativeness and validity of the insights drawn from the data, the complex and dynamic nature of the data may pose considerable challenges. Furthermore, the success of applying the technique is dependent on companies providing visitor origin information and the coverage at which the data is provided. Sample weighting aims to adjust the sample to better reflect the broader population, but if the data sample is limited to a small area (as might be the case with an event) or a particular group, it may not capture the range of characteristics present in the population. This lack of representativeness can make weights volatile and inaccurate.

Mobile app data: key breadth phase findings

Possibilities: Estimation of visitation and engagement: The detailed nature of these data sources allows for the potential measurement of engagement levels, visitor numbers, average dwell time, and catchment areas, offering potential beyond the reach of conventional data sources.

Use in temporal analysis: Evidence suggests that mobile phone app data can effectively estimate engagement over various time frames, aiding in understanding seasonal variations and temporal trends over aggregate time periods (Cameron et al., 2020; Heo, Lim and Bell, 2020; Mears et al., 2021). However, its applicability to short-lived events might be limited due to the volume of data, aggregation level available and the socio-demographic coverage of the sample. Ultimately, this is novel analysis of a new and emerging area, so further testing is required to ensure clarity in a variety of use cases. Weighting the data, for example, can help to correct for issues with temporal analysis, but only as far as the available data will allow. Trends over years, for example, are more likely to be more accurate then those over months or days owing to the larger size of dataset available.

GDPR compliance and consent issues: Ongoing challenges in ensuring GDPR compliance and managing the consent process for data collection may impact the legitimacy of data usage, particularly given the ambiguity and lack of transparency or clarity in the consent process.

Lack of demographic data: The general de-identification of this data leads to uncertainties in sample representativeness. While companies or researchers often aim to infer information about the socio-demographics of the data (Sinclair et al., 2023a), this layer of information is usually absent in aggregated versions. While obtaining raw data samples might offer socio-demographic insights, it raises ethical concerns, including re-identification risks, and may not be possible in this project. Practicalities:

Reliance on individual-level GPS data: A common trend across these data sources is the collection of high-resolution GPS data, generated from the use of mobile phone applications. The granular level of individual data collected by such companies is unique.

Population coverage: Despite efforts from companies to collect data across a broad range of demographics via diverse apps, actual coverage is relatively low, typically in the single-digit percentages. Some studies, such as Sinclair et al. (2023a), reveal that the socio-demographic mix of users in mobile phone app data closely aligns with the expected composition of the general population, despite this low percentage coverage. However, it is important to note that coverage tends to diminish when data is confined to smaller spatial and temporal scopes, a factor that could impact this project. Ultimately, the population coverage of mobile app datasets is a developing area with more work needed on a national scale to establish further clarity. For this reason, results should be interpreted with this limitation in mind, and care should be taken on the use of mobile app data models.

Unclear data collection and analysis process: The lack of transparency in the mix of applications used, raw data, and algorithms applied by the third-party companies to produce products and services may influence the trustworthiness when using the data.

Changing user volumes and data collection methods: Fluctuations in the volume of data and number of users present in the datasets over time are influenced by commercial decisions, app changes, and regulatory modifications. These changes affect the data’s representativeness and reliability, notably when using the data for time series analysis.

Data resolution and availability: Some sources may not provide data at a sufficiently detailed spatial or temporal resolution to enable the analysis of small spaces or short-lived events. This could be caused by limited data or a limited sample of users which risks bias and unreliable results. Access to more informative raw data is often restricted due to ethical concerns about potential user re-identification.

Breadth phase recommendations

Prioritising data from Huq, ActiveXChange, Echo Analytics, UniCast, and CKDelta.
While raw mobile phone data offer the most versatility, ethical concerns for the project and the reluctance of companies to provide such data necessitate a different approach. Work will proceed with the formats provided by these companies, which are expected to be less granular.
The most promising use of these data sources is anticipated in studying fixed cultural and sports sites over extended periods, such as weeks, months, or even a year. This is due to the reduced population coverage when data is limited to smaller sites or shorter time frames. To effectively capture activity variations at different times of the day or week, it’s advisable to aggregate data over larger time windows. Such aggregation will not only enhance the reliability and applicability of the insights but also minimise potential biases.
Regression and machine learning techniques will be prioritised where the counts of mobile phone data, and the counts of mobile phone users, will be utilised as explanatory variables.
For Huq and CKDelta, which are expected to provide visitor origin information at some level, sample weighting will be applied to adjust for biases.
Finally, these data sources will be combined with additional methods and data types. For example, crowd counting techniques applied to images, will be used in combination with visitor origin information from mobile phone app data. This integration will help analyse the reach and demographics of event attendees which would otherwise not be possible from crowd counting alone.

5.3 Experimentation

To support experimentation in the Depth Phase, the project team prioritised acquiring a sample of test data to allow a trial of proposed approaches in a test location. The British Museum was chosen as a good test candidate to trial mobile app modelling, due to the quality of published baseline data available. This section of this report will summarise the approach and results of tests run using the Huq data source. More detailed findings, including detailed methodological steps, can be found in the Huq case study (Annex 2).

Given the number of greenlit data sources identified in the Breadth Phase for this data category the project team conducted further follow-up discussions with a range of providers. The result of these continued conversations was the prioritisation of Huq as the data source with which to continue testing. Other suppliers were deprioritised for this project’s interest, owing either to prohibitive cost models or because their data package did not compare to the level of detail available through Huq.

The Huq dataset is derived from a range of mobile phone applications, which collect real-time location data from users’ smartphones. The dataset covers geographic locations across the UK and spans five years (2019-2024). It can be used to generate insight into human mobility patterns and behaviour, such as consumer trends, the impact of events on movement, and support decision making processes including urban planning.

Huq data delivered to the project team contains one row per site, per user, per day and includes:

An anonymised unique user Id (which is irreversibly hashed)
A weight assigned to each user that is based on the home region of the user known as the adjustment factor. This weight corrects for over or under representation of mobile phone users in the dataset by comparing the ratio of mobile users to the known population across all regions in the country.
The home enclosing region of the user/visitor. E.g., Greater Manchester, Highlands. Note that ACORN (A Classification of Residential Neighbourhoods) segmentation, created by CACI, is provided at a more granular level (e.g. postcodes).
The sites user visited such as Edinburgh Castle, British Museum, or Buckingham Palace. This is recorded as a Polygon ID.
The date when the user visited site.

The results of the testing of this approach are summarised in section 5.3 below.

5.4 Huq case study results

5.4.1 Estimating visitation

Information extracted from mobile phone app data gathered from visitors to the British Museum, with an allocated home geography, was used to weight and scale the dataset to an estimate of visitation for the population. The results are shown in Table 5.4.1 where visitation is broken down for the British Museum by month across five years. They are subsequently compared to official visitation data in figure 5.4.1.

Table 5.4.1: Monthly British Museum Visitation Estimates Generated Using Huq


Month	2019	2020	2021	2022	2023
Jan	463,276	483,977	20,190	123,645	246,739
Feb	610,208	519,213	20,470	165,879	80,416
Mar	683,438	212,630	30,514	151,617	132,554
Apr	815,027	8,202	15,493	196,864	194,584
May	592,879	5,230	54,808	270,111	210,691
Jun	731,188	5,306	78,234	265,321	453,626
Jul	628,596	11,086	107,024	304,420	350,846
Aug	641,342	6,334	153,727	203,039	295,516
Sep	435,431	47,572	112,413	108,360	157,963
Oct	529,337	62,408	177,637	201,111	184,207
Nov	599,474	26,531	145,904	173,539	107,028
Dec	569,713	49,801	119,919	131,478	121,812
Total	7,299,910	1,438,289	1,036,333	2,295,384	2,535,982

Figure 5.4.1: Comparison of published official visitation data for the British Museum (blue) compared with Huq generated estimates (red)

Despite the estimates being scaled and weighted from a relatively low number of mobile phone visits, the comparison to official data showed a strong trend. This trend was especially clear in the early period of the Covid-19 pandemic, following the first nationwide lockdown when the museum was closed to the public. However, it was also apparent through the peaks and troughs following the easing of lockdown restrictions.

While prior to 2020 the data tended to overestimate visitation, after the first lockdown period the data tended to underestimate visitation when compared to the current published statistics. This could be caused by several factors but two are more notable. Firstly, this data consists of mobile phone users who are aged 16+ and therefore children who represent a key visitor group in the official data are missing. Second, mobile users who cannot be allocated a home geography (because they are not in the dataset long enough) are not included in the methodology developed to date, meaning international tourists are likely to be excluded. Low mobile user numbers, changes to the way the company collected the data over time and the pool of apps it draws from could also have an impact. This includes changes to UK legislation and technology company policy which can change the way mobile phone app data is collected over time. In 2018, for example, Apple removed apps from the app-store which shared data without consent in response to privacy concerns, and both Apple and other handset providers are potentially updating their T\&C’s to provide less data - particularly location data - to apps, with privacy in mind.

In estimating the success of this approach we have applied the Pearson’s correlation coefficient, the Mean Absolute Percentage Error, and the Root Mean Squared Error.

The Pearson correlation coefficient for the monthly estimate is 0.77 (p-value <0.001). This strong correlation should be considered a good result for this stage of the project. The Pearson correlation is a statistical measure that quantifies the strength of a relationship between two variables from -1 (very strong negative correlation) to 1 (very strong positive correlation). The probability value (p-value) in statistics determines the likelihood that any relationship between the data is due to chance. A p-value of < 0.05 indicates strong evidence that our findings are not due to chance (less than 5% probability).

5.4.2 Estimating footfall

Figure 5.4.2: Footfall at the British Museum (2019-2023) from mobile app data. Note this is overall and not separated by museum floor level.

The Mean Absolute Percentage Error (MAPE) is 75% and the Root Mean Squared Error (RMSE) of visitation is ~160,000 visits per month. The MAPE and the RMSE can be used as an evaluation of model performance. As both are fairly high in this case it shows that there is room for improvement in the performance of the model. This is almost certainly due to small values for some of the months. Model performance maybe improved (and prediction error minimised) by increasing the temporal scale at which we compare the data

One of the most unique characteristics of mobile phone app data is the spatial resolution of the data, which consists of a set of high accuracy points generated using a wide range of mobile phone applications. By aggregating this data, it is possible to visualise the density of footfall for the chosen site, as demonstrated in figure 5.4.2 above. In this example, the visualisation is superimposed onto a Google map, but data providers can plot data against alternative basemaps if provided by the end user.

The visualisation produced for the British Museum indicates clear hotspots of use. While this visualisation shows aggregated footfall across an extended period of 5 years, it would also be possible to analyse the use of space across smaller time windows. The visualisation at figure 5.4.2 shows museum use as a whole and does not distinguish by floor or elevation. Combining the spatial resolution data produced by mobile phones with additional novel techniques such as barometric pressure sensors, Bluetooth beacons or wi-fi measurement could potentially enable an examination of use by floor. This would, however, require extensive additional research and testing before any reliability could be established.

5.4.3 Estimating catchment

Using Huq data, each visitor to the British Museum was allocated a home geography (with the exception of most international visitors, who are often not present in the data long enough to be assigned a home location). While this was important for estimating visitation numbers, it was also used to support estimation of the catchment (or reach) of a site or event. Figure 5.4.3 shows the catchment as the percentage of visitation to the British Museum between 2019-2023 broken down by Lower Layer Super Output Area across the UK and Northern Ireland. This shows the percentage of the total number of British Museum visitors in the relevant time period that come from the region shown. The percentages sum to 100%.

Figure 5.4.3 : Catchment of visitors to British Museum (2019-2023) from mobile app data

Figure 5.4.4: Catchment of Visitors to British Museum (2019-2023) from mobile app data, highlighting visitation from surrounding region. Note that the percentage values shown are using the complete UK data and not subset to London.

The most notable pattern from this visualisation is the density of visitation in the regions surrounding the site (figure 5.4.4), indicating the potential impact of accessibility on visitation. The other element of these visualisations that stands out are the blank spaces. This does not mean that there were no visitors from these regions to the British Museum during the period but that mobile phone users are not observed from these regions during the time period. While mobile phone app data providers pool data from a wide range of apps to attempt to create a representative ‘panel’ of users, data generally covers a single digit percentage of the population. Blank spaces may also be a result of measurement bias among users in the panel. It is possible for a user present in the panel as a whole to not be recorded in these maps even if they have visited the British Museum. This is because different apps collect different types of data, and different apps and users vary their consent and data sharing processes which can allow for more or less data to be collected per user. This result highlights some limitations with the sample coverage, but which are not unexpected when compared to traditional techniques such as surveys which only capture a small sample of the population visiting a site.

It may be possible in future to explore spatial methods which allow to model and impute values for missing regions based on their characteristics and surrounding regions.

5.4.4 Recommendations

Given the promising nature of the results from the testing phase in capturing engagement for the British Museum, the recommendation is that these testing outcomes are applied in Strand 2. This further experimentation will support development and determine how useable or otherwise the developed approaches will be.

Further experimentation in Strand 2 should include the following steps:

Iteration of current approach. Extending the approach tested with the British Museum to a wider variety of additional sites which have baseline visitation data and testing alternative methods to estimate visitation which do not rely on a visitor’s home geography, in an attempt to account for international visitors.
Test additional modelling techniques. Visitation estimates from a larger number and diversity of sites will be combined with the site and context-specific information, as well as additional data sources where possible, to explore regression and machine learning models which will further improve visitation estimates.

Question 7

6. Transport and activity data

Accepted Answer

6.1 Breadth phase overview

Transport data is concerned with tracking the flows of people using private (e.g., cars, bikes, walking) and public (e.g., buses, trains, public cycle hire) means of transportation. In addition to providing raw counts of individuals using a particular mode of transport, some data providers can also provide traveller origin information (the location from where someone started their journey). As such, this data can be used for two purposes:

As a parameter to estimate attendance to an event in combination with other data sources
To determine where attendees have come from, creating the possibility of using the data to extract audience demographics

TomTom and Citymapper are high-profile examples of transport data providers which had already been considered for use in similar projects by the consortium team, while the University of Glasgow (Serra and Zeinullin 2022) have demonstrated the potential of object detection counts (without reidentification of individuals) via CCTV. Strava Metro was frequently referenced in the literature reviewed in the Breadth Phase evidence synthesis, including high-profile publications by TfL (Davies 2017) and the ONS (Joshi et al 2023), who have used Strava Metro as an input feature for a machine learning model estimating the use of natural spaces.

Some sources of transport data include details about the starting points of people’s journeys which can be used to gain more information about the characteristics of participants and attendees. By correlating the areas from which people travel to these cultural and sports spaces with Census demographic data, it is possible to infer demographics and determine demographics of groups engaging with these events and spaces. Further testing will be done with sample transport data once acquired by the consortium to explore the feasibility of demographic analysis.

As most transport data is not concerned with tracking unique users over time but rather aggregates all individual trips, tracking long-term participation will not be possible using this type of data source alone. Transport data tends to be individuated by location rather than individual, meaning data is provided by location rather than the number of locations visited by a specific individual. However, Strava Metro gives insights into the number of unique users recording trips and their demographics, meaning that it may be possible to track participation in running, cycling, and active human-powered travel. Importantly, the demographics of Strava users tend to skew younger and have fitness-focused demographics (Venter et al 2023), a finding which is further supported by analysis of 2023/24 Participation survey data, which suggests that younger age groups are more likely to own a smartwatch than their older counterparts. This will have implications for the generalisability of models trained using this data, with the likelihood that those events best predicted by Strava data will be sporting events attracting this demographic.

With regard to developing a method to estimate attendance, raw passenger counts may be used in a regression model with other types of data to estimate attendance at events. However, data is very often only available for selected locations, meaning that some methodologies using these data sources will only be able to be applied to events within specific locations.

6.1.1 Identified modelling approaches

Regression models

As set out in section 4.1, regression models offer an approach for estimating attendance at unticketed events and cultural spaces by using multiple data sources, including transport and activity data specifically. Section 4.1 gives full details on the regression modelling approaches that could be applicable to transport data testing and so are not repeated here.

Examples of regression modelling utilising transport data, however, are prominent in the literature reviewed as part of the evidence synthesis for this project. Chief among these examples is the work by Joshi et al (2023), who used Strava Metro, combined with weather and static demographic data which enabled them to create a regression model that estimated monthly visitor counts to U.K. green spaces.

Positive examples such as this work mean that regression methods will be prioritised for testing with transport data in the depth phase.

Sample weighting

Sample weighting is a common technique to adjust the sample data to correct for this bias and is used in many surveys where the sample involves some sort of self-selection or opt-in by the participants (Hays et al 2015; Solon et al 2015). As the choice of transport is highly dependent on demographic factors, single data sources in this group may have demographic biases that need to be accounted for.

Concerning individual transport data sources, the only data source identified by this project with explicit demographic data is Strava Metro. Joshi et al (2023) pointed out that when comparing Strava Metro data to England’s census, Strava Metro had demographic bias, notably overrepresenting men and the younger generation, and potentially needed recalibration. However, they noted that, for their use case, ‘more good quality and diverse on ground observation training data are required to develop this recalibration scale’. Nonetheless, they were able to use it in a regression model without employing sample weighting, suggesting that they were able to obtain reliable estimates despite its bias. As such, although not imperative, when using Strava Metro an attempt should be made to retrieve good quality baseline demographic visitation and participation data to improve estimates.

As detailed in section 4.2, sample weighting should be used when using data with sampling bias, considering each dataset and model on a case-by-case basis. In the case of Strava Metro data, it should be a medium priority, as Joshi et al (2023) were able to build a reliable regression model without the use of sample weighting, although they recommended to employ it if the possibility arises.

Transport and activity data: key findings

Possibilities:

Highly specialised: Transport data sources are highly specialised and only focus on one method of transport. Moreover, they tend to only provide data on one specific location. An ideal would be to use multiple transport data sources to cover the entire gamut of transportation.

Combining with other data to derive audience and participant demographics: Only one data source, Strava Metro, provides demographics explicitly. So, it will be challenging to estimate audience and participant demographics solely using transport data. TomTom provides origin/destination analysis, meaning that it may be possible to see where people attending an event or space are coming from within the city. This information could be crossed with Census data to derive some basic demographics about the audience, noting the limitation that journeys may not always originate from a home location. Mitigation includes training machine learning models to classify locations as permanent or temporary based on patterns in the data.

Combining with other data to calculate attendance at specific events: Transport data has been used to derive attendance at public spaces through regression modelling. Similar techniques can be used to calculate attendance at given events.

Travel trends: Historical transport data can be used to determine trends in visitation to a certain space, including seasonality. However, it is not possible to reliably link these trends with visitation caused by a single event.

Practicalities:

Limited data geography: Data is often only available for selected locations, which tend to be cities or large urban areas. This means that some methodologies using these data sources can only be applied to events within specific locations. The only nationwide data sources available are those related to vehicle traffic.

Limited access to traveller counts: Accessing traveller counts is difficult. Only one public transport operator was identified as sharing exact counts (Go-Ahead in East Yorkshire) and one offering percentage-based counts (TfL for Underground stations). Some private transport data providers offer counts but restrict access behind a paywall and a per-customer agreement. As such, transport data cannot be relied upon to get an exact attendance count in isolation and will need to be combined with other data types.

Limited access to data: Transport providers are highly reticent to share their data, especially when compared to counterparts in other industries such as the activity data space. For example, only one bus transport provider shares crowdedness data publicly. Third party services may also be reticent to share their data with clients other than city councils and transport providers, including Citymapper and Waze.

Inherent bias: Transport data sources contain inherent bias, as the choice of method of transport is dependent on socioeconomic and geographical conditions. Certain sources only work in selected geographies. For example, TomTom is most useful in rural areas that are more dependent on car use. Other sources, like Strava Metro, overrepresent certain population subgroups in their data.

Breadth phase recommendations:

Transport data may prove useful for calculating attendance at specific events by using it to build regression models, although the specific regression method used will depend on the characteristics of available data sources.
The top two main data sources recommended for further exploration are TomTom and Strava Metro, as they offer wide coverage and can be used to provide information on both participation and demographics.

6.2 Experimentation

To support experimentation in the Depth Phase, the project team prioritised securing access to the Strava Metro dashboard, allowing testing for a specified location. This section of this report will summarise the approach and results of tests run using Strava data. More detailed findings, including detailed methodological steps, can be found in the Strava case study (Annex 3).

Strava was greenlit following the Breadth phase and prioritised for testing in the Depth phase, owing to a favourable assessment across numerous criteria. This includes a track record of successful deployment in the literature reviewed by this project, its low-cost nature and potential to determine demographics. Full details are available in the Strava case study (Annex 3).

Other sources of activity data were assessed by this project, including Garmin, which helps to track a user’s health and wellness journey. Like Strava, it requires self-reporting with users including their daily health and fitness data into an app. However, Garmin was deprioritised as their Activity API did not provide aggregate data, which is required for both robust modelling approaches and to protect privacy. Strava could provide aggregated access through their dashboard, to which Faculty AI and Verian were granted access.

Strava data is aggregated data recorded by users of the Strava app where the activity is separated into cycling (ride/e-bike ride) and pedestrian (walk/run/hike). Events which involve multiple participants engaging in a physical activity, such as parkrun, often generate extensive data within Strava as participants seek to record their performance. Therefore, an accurate estimation of attendance of the event can be generated through analysing Strava data correlated to a known event.

The Strava data accessed for initial testing dates from 2019 to the present and covers Edinburgh, including surrounding countryside. A requirement of the application process to gain access to the data was to nominate an initial area of study. Edinburgh was considered a suitable choice with ticketed and un-ticketed events of interest held in and around the city.

Since Strava is used primarily for recording recreational exercise, parkruns (specifically parkruns in Edinburgh held in 2023) were identified as a sensible target event for which to aim to model attendance. An additional benefit of parkruns in this context is that event organisers publish weekly recorded finishers online. This publicly available data can be used to define a scaling factor for parkruns and similar events, alongside validating the methodology.

6.3 Strava case study results

Depth phase testing using Strava data in conjunction with an XGBoost predictive model and estimated scaling factor produced core findings that will be developed in Strand 2 experimentation:

The data is most suitable for sporting and recreational activities
A moderate positive correlation with the recorded number of finishers has been achieved using parkrun data from two different Edinburgh-based parkruns. This indicates that the method is suitable for this type of event
Many unexpected predictions by the model are adequately explained by missing parkrun data or altered event times
However, calibration is necessary on a per-event level as even similar events (such as two parkruns at different locations) can have different scaling. This indicates that it is necessary to record attendance manually at a small sample of events so the methodology can be applied more broadly to all events within a location

6.3.1 Estimating parkrun attendance

The model developed to date for testing in the depth phase when deployed to estimate parkrun attendance in Edinburgh achieved a moderate positive correlation (r=0.491) between the model estimated attendance and ground truth number of finishers over the year 2023:

Figure 6.3.1: Model-estimated parkrun attendance with scaling applied (blue) against park run recorded finishers (red) for year 2023.

Contributing negatively to the correlation score are the two anomalous weeks 17 and 32, which respectively featured dramatically higher and lower estimated attendance than number of finishers. In the latter case, this may be due to the event starting half an hour later than usual that week. Analysing the data for week 17, there were other days in the prior week with anomalous usage, indicating the possible presence of other events. This could have caused the dramatically higher estimated usage that week, highlighting one of the failure modes of this methodology. Further work could attempt to address these sorts of failures, specifically: training the model on a wider range of race data including from a variety of locations. Increasing the diversity of training data in this way is likely to expose the model to more anomalous events, therefore making the outputs less sensitive to changes like late start times.

6.3.2 Recommendations

Depth phase testing has demonstrated that Strava data could be used in other events and locations, however, at this phase of model development, results are likely to be subject to several limitations:

Not all those participating in recreational and sporting activities will use Strava
Not everyone using Strava is necessarily attending a specific event
Strava records data differently depending on the paths in use and not all paths through the park have recorded usage; and it is unclear whether this is due to Strava not recording them as paths or whether they are just unused by users of Strava
There is double counting: if a user passes in one direction on a path and returns the other way, they are counted twice. There may also be higher-order multiple counting in certain situations. Therefore, the numbers counted may not be an accurate reflection of the actual number of people

Overall, Strava is a valuable source which is worthy of continued experimentation while considering its limitations. The current limitations summarised above can be remedied through application of the following approaches in subsequent phases of work:

Exploring different variables in the Strava data to avoid double-counting, such as tracking whole routes from individual runners rather than intersections with specific paths.
Training the model on a wider range of race events from a variety of locations to increase the diversity of the training data
Implementing a ‘scaling factor’ into the modelling approach such that it is possible to predict total attendance at the event based on predicted number of Strava users, including by integrating additional variables, such as weather and demographic data.

Question 8

7. Social Media data

Accepted Answer

7.1 Breadth phase overview

Social media data offers the possibility of “social sensing” what is happening in the real-world, based on the behaviour reported by users in the online world. Users will often make use of social media to report or comment on real-world events. In general, news can break first on social media (Osborne et al 2013) and therefore, monitoring social media can give a real-time insight of interest in current events.

Social media analysis is already being used by HMG across a range of use-cases, as outlined by the Government Communication Service (Cabinet Office 2024), including in DCMS for detecting misinformation (NewsGuard 2023). Interviews with event organisers during the Breadth phase highlighted the potential for social media to help track event attendance. Given the changing cultural norms and trends for at least a subset of the population to publicly post their location and attendance - or desire to attend - at sporting and cultural events, comprehensive analysis is required about the viability of social media data for estimating attendance at unticketed events.

However, this form of social sensing is particularly susceptible to noise. This can take various forms, such as;

Implicit messaging - Not every real-world event is reflected on social media, and not every participant necessarily reflects their activity on social media. A significant volume of social media use is implicit, in that users read, but do not explicitly post/comment.
Variable locationality - Not every user explicitly commenting on a real-world event is physically present at that event.
Platform bias - Different social media platforms have their own potential bias in terms of users’ populations, with the usage of social media platforms varying by age group, socio-economic background and location for example (see figure 7.1), as well as their specific aims and intricacies (e.g. photo oriented, short text). The Ofcom Adults’ Online Behaviours and Attitudes Survey 2023, for example, reveals that younger adults (16-24) are highly active on social media, using a wide range of platforms like Instagram, TikTok and Snapchat. Adults aged 25-44 also use various platforms, including Instagram and Facebook. In contrast, older adults (45+) tend to use fewer platforms, with Facebook being the most popular among them.
Misuse: Platforms can be affected by spam postings made by bots, misinformation and dis-information by hostile state actors etc. This is less likely an issue for events, as events of focus in this study are less likely to be targeted by such bots.
Level of Discourse: Some topics may have a high-level of non-physical participation and subsequent discussion. High profile sports events, for instance, may have significant online discourse, for example about performance of players or managers. The level of discourse may vary between platforms (e.g. Instagram may be more focussed on images rather than discussion).

Figure 7.1: Distribution, by age, of Adult Users using each Social Media Platform – Source (Ofcom Adults’ Online Behaviours and Attitudes Survey 2023 (Ofcom, 2023))

In general, social media data can come in three different forms:

Metadata - Data associated with a post such as author, timestamp, engagement statistics (such as likes, reposts, shares) and sometimes geo-location
Media - Images or videos included within a post
Text - Any natural language / comment associated with a post

There are different forms of modelling and analysis that can be considered for each of the data types in the table above.

The type of data and the volume that can be collected for analysis differs depending on the social media platforms’ Terms & Conditions. Data is either collected via keywords (for example, whether the social media post contains a specific term) or via a channel (for example, all messages within a specific channel or page). Due to the volume of data on social media platforms and the limits imposed by platforms on data collection, any social media data source will only be a subset of all posts that were posted on the platform. For instance, the X (formerly known as Twitter) search API endpoint can only surface posts posted in the last week, while the more expensive full-archive endpoint is required to obtain search access back to March 2006. Moreover, it is not possible to obtain posts made to a Facebook group through the Facebook API unless you are the owner of the group, and posts made by users to their own timeline are inaccessible through the API.

Data privacy concerns are important to address in social media monitoring, particularly after the Cambridge Analytica scandal, which became public around 2018. This company collected the personal information from the profiles of 87 million Facebook users worldwide without their consent and permitted micro-targeted political advertising. This, and GDPR compliance, has motivated the tightening of API restrictions in social media platforms over the last few years.

The last 10-15 years have seen a considerable amount of research on social media analytics across academia and industry – indeed, the X Terms of Service for users, for example, asks users to “consent to the collection and use” of “the information you provide”. Users can choose to limit the distribution of their content. Conducting and publishing of a Data Protection Impact Assessment and Privacy Notice, as has been done by DCMS for this project, helps to minimise risks arising from processing of personal data.

Estimating engagement is intended to only address sufficient information to ascertain the likelihood of a user attending an event/location. As discussed under the natural language processing modelling section, this may involve processing of the text of social media posts. However, personal information, particularly user-ids, will be irrevocably hashed before storage of datasets, to minimise as far as possible the chance of deanonymisation. Processing social media posts for estimating engagement does not include making any attempt to re-identify an individual based on the content of their post.

During the Breadth phase, social media data sources were identified based on a careful study of the Application Programming Interface (API) documentation of the key social media platforms used in the UK. Data can be collected from social media platforms through APIs provided by the platform or through a third-party provider that has permission from a given platform to access their data. As part of the Breadth phase, relevant research publications were also thoroughly reviewed (including those that reference and are referenced by University of Glasgow’s own publications) to identify suitable analytical methodologies and determine how the field had evolved.

7.1.1 Identified modelling approaches

Social media analytics involves leveraging various techniques to understand and interpret data generated on social media platforms. In the context of this research, the use of social media data is generally concerned with detecting implicit occurrences of users engaging with locations and/or events (e.g. mentioning an event within a post). Engagement can be:

Virtual: a user mentions an event, but did not physically participate in it, e.g. a user posts about an event they observed in the news/media/socials.
Physical: a user posts about an event that they attended in-person.

For most cases, this research is concerned with physical engagement and aims to separate virtual from physical engagement. However, some virtual engagements can be indicative of physical engagement, as people visiting places may engage on social media in relation to those events or locations, either before, or after the event (de Lira et al 2019) – for instance, “Can’t wait to attend X”, or “Throwback to last weekend’s Y festival”.

Alongside media and text data, social media modelling approaches can also utilise metadata provided by social media platforms. Metadata provides details such as the timestamps of the post, or location, and user information. Each form of data has a common approach to extract meaningful information before it can be used for research purposes.

It should be noted that much of the analysis in the literature has been performed using tweets, due to the open APIs previously provided by Twitter (now known as X). Alternative social media data sources, such as Instagram or Facebook, can also be used for analysis. However, accessing the data from these platforms is much more challenging because of the requirements of setting up the appropriate user accounts and permissions. Consequently, alternative social media platforms will be explored more in Strand 2. Given that accessing the X API directly is not recommended on a cost-benefit basis, the analysis detailed in this report can instead be performed on posts obtained using third party aggregators such as Pulsar.

What follows are descriptions of the forms of analysis that can be undertaken upon social media data.

1. Automated content analysis of photographs or videos

Various social media platforms allow users to share photographs and videos, which can offer valuable insights into user behaviour (Richards and Friess 2015; Tenkanen et al 2017; Schirpke et al 2023). This type of content analysis offers a way to extract information from the photographs or videos in an automated way, enabling the classification of content and a better understanding of user behaviour which can be used to inform attendance. Technologies can be employed to identify landmarks, objects or types of activities. The text extracted from the images can then be combined with Natural Language Processing (NLP) techniques to extract sentiment while the associated metadata can be used to enrich findings further. Image recognition and video analysis can be conducted through open source or commercial software. Examples of popular commercial tools include Google Cloud Vision, Clarifai and Microsoft Azure Computer Vision (Ghermandi et al 2022). While this approach has been used in the past to estimate engagement with natural and cultural spaces (Ghermandi and Sinclair 2019), recent changes in access to data from the relevant social media providers is limiting progress in this field (Ghermandi et al 2023). Therefore, this modelling approach will not be taken forward in this project.

2. Metadata (geo-location)

Metadata refers to data that provides information about other data, for example, in the context of social media (text, photo, or video). Geo-located metadata includes information about where a post was taken, as revealed through GPS coordinates or location tags. It can also include who the post was taken by, as revealed through public information from the social media profile, although this is not always available. For instance, some social media platforms allow the user to self-declare a limited form of demographics (e.g. location, age and gender). However, in general, the increasing awareness for privacy is motivating (i) users to be more cautious about their posts (e.g. fewer public tweets), and (ii) more restricted access to the platform APIs.

The metadata from a user’s profile, if available, can help to identify the activity of various user groups (Sinclair et al 2020a), for example, identifying whether the photographer of a park is a local, a domestic or a foreign visitor.

3. Locating users

Only about 1-2% of all tweets contain geotagged metadata^{[footnote 1]} and since 2019, this metadata has not included precise geographical coordinates (e.g. longitude and latitude). However, locating the home location of users has real potential to be useful, not only for demographic purposes but also for knowing the reach of cultural events. Some users may declare this specifically on their profile, while for others this may only be inferred by examination of their posts and who they follow. Recent research has explored whether these connections can be made. For instance, Tang et al (2022) identified text features from the users’ posts to develop a supervised learning model for predicting the location of a user while Ebrahimi et al (2018) proposed to categorise local vs global celebrity users. Local celebrities are well-known in local community (country, city, etc.), but not worldwide, and therefore are highly discriminative of the likely of a user – i.e. if another (non-celebrity) user follows a local celebrity, this could be indicative of a user’s home location. In contrast, global celebrities have followers from many localities, and following such a celebrity is not indicative of following relations. Celebrities were separated into local or global by application of a clustering method. Application of this method requires access to friend-follower relationships within the social network, which is more difficult to acquire than in the past given increased privacy controls in the social network APIs.

Much other related work in this area has focussed on the geolocation of individual tweets – e.g. (Ajao et al 2015; Gonzalez et al 2018) – often by training models using a set of geolocated tweets as a ground truth. All of these works are probabilistic, giving a best estimation of the location (home) of the user, rather than with any guarantee.

Given the interest of DCMS in demographics, approaches for locating users to a hometown will be taken forward (this is not personally identifiable information), but aspects relating to the data protection impact assessment (DPIA) will be kept under review. The “local celebrity” approach for example will not be carried forward, owing to concerns on this project regarding the extent to which the friend/follower data would be available and whether it would be ethically desirable to engage with it.

4. Natural Language Processing (NLP) of text

Text data from social media encompasses everything from status updates, tweets and the description of photos or videos in posts to comments, hashtags and mentions. Natural Language Processing (NLP) covers the range of tools and approaches generally used to analyse text from social media. Analysing textual posts can provide information on content, topics or insights into public sentiment and engagement levels regarding specific locations or events (Tenkanen et al 2017; Schirpke et al 2023).

As previously discussed, it can be difficult to determine whether a person has attended an event from their social media data. Commercial offerings such as Pulsar assume the presence of terms (or hashtags) to uniquely identify participants in an event^{[footnote 2]}. An approach like this was used by de Lira et al (2019) and Tonga et al (2020). In fact, good event management practices involve the promotion of hashtags that can make it easier to track engagement (Homer 2022), and aid discoverability in platforms such as Instagram^{[footnote 3]}.

Manually identified queries may have a limited Recall of posts about the event (i.e. the fraction of relevant posts to the event returned by the query) if the selected term/hashtags are not comprehensive enough to capture all needed query terms. In contrast, if the terms chosen are too broad, then ‘precision’ will be affected (too many non-related posts will be received), increasing the cost of accessing the relevant API (as costs are per post). To address this, commercial offerings such as Pulsar offer pre-defined Boolean queries (e.g. X and NOT Y), and assistance in formulating queries. Pulsar also offers queries that only search the followers of a given account (e.g. search the posts of people who follow the event account). Alternatively, several studies have examined tools to assist in the formulation of queries, such as Rivas & Hristidis (2021) or Mazoyer et al (2018). Tonga et al (2020) also manually identified relevant sub-communities coalescing around their event as a mechanism to track those posting.

In practice, users may engage entirely virtually (only on social media), or both physically and on social media. The challenge then is to separate virtual engagement from in-person physical attendees mentions. To this end, de Lira et al (2019) used word occurrence classifiers that had been tuned for festival events – posts could be classified as being pre-event, during-event, post-event attendees. On the other hand, Tonga et al (2020) used a semi-automated crowdsourcing, by actively asking users using the hashtag if they were present at the event. But this approach may not scale to all but the smallest events.

A key related work is by de Lira et al (2019), which examined several UK festivals. They developed a classifier that can differentiate between real in-person attendees and virtual attendees based on the text of their post. This classifier demonstrated 80% accuracy overall. Focussing on their exemplar event, Creamfields 2016, from 90k posts before, during and after the event, ~35k posts were identified as being physical attendees using the aforementioned classifier. From these ~35k Creamfield-related posts, a total of 10,788 were predicted to be physical attendees to the event; contemporaneous sources indicate that this weekend-long event had about 70,000 attendees. De Lira et al further analysed the ~3500 Twitter users that had declared their hometown^{[footnote 4]}, finding significant numbers of fairly local attendees for Creamfields (e.g. from Liverpool), as well as those from other significant cities, such Glasgow, Dublin, London and Edinburgh. While the usage patterns of X have evolved over the years, the general approach is still usable and can be applied to other text-based social media where users discuss their attendance at events (e.g. Instagram). We refer to this as the User Attendance Classifier (UAC) approach, and we recommend it be taken forward to the Depth phase.

Since 2019, the state-of-the-art in tweet classifiers has moved on to the use of more recent NLP methods, specifically contextualised language models such as BERT (Devlin et al 2019) or RoBERTa (Liu et al 2019). However, as the word distribution differs in tweets compared to other (English) language corpora, there is an increasing recognition that other language models are needed for textual tasks on tweets. This is exemplified by work such as BERTTweet (Nguyen et al 2020) and more recent language models released by Loureri et al (2022) from Cardiff University. It is recommended that any application of the User Attendance Classifier consider the use of more modern text classification methods, such as those discussed here.

5. Learning to quantify

Quantification (Esuli et al 2023) is a novel and recent approach allowing the estimation (or “prediction”) of the relative frequencies of classes in unlabelled data with the help of supervised learning. The focus of quantification is on aggregated data not on individual data points, i.e., it is used to estimate the prevalence (the relative frequency) of certain attributes or classes of attributes in bigger collections. Indeed, quantification can be applied to any task that deals with data points whose membership in a class is uncertain, i.e., would require classification via supervised machine learning, and where the goal of the task is to estimate not which class an individual data point belongs to, but how many data points belong to a given class.

The most basic and intuitive approach in this setting is standard “Classify and Count” (CC) method. The Classify and Count method is a straightforward quantification approach where models are trained to recognise classifications (i.e. types of the data in question) and then count their prevalence in a dataset. However, quantification approaches can make better estimates by tuning the threshold of how confident the classifier should be for a true prediction to be counted.

Quantification may be useful in taking the output of the User Attendance Classifier and using that to better estimate the number of users attending. Indeed, the CC method was used by de Lira et al (2019) in their prediction of festival attendance.

Within the recent literature, quantification has been used, for example, in political science, to monitor the degree of support for a certain politician or policy from social media posts (Moreo et al 2022). In market research, quantification can be used to determine the support for a certain product from textual answers to questionnaires.

Among the proposed quantification methods, we note Adjusted Classify and Count (ACC) (Forman 2006), which corrects the problems in the CC method. It integrates classification and counting to estimate the class quantities in a dataset. It starts with a classification step to assign class labels to data points, followed by a counting step to estimate class counts. The distinctive feature of ACC is its adjustment phase, where it considers the uncertainty in class assignments by incorporating confidence or probability scores from the classification model to adjust the true and false positive rates of a classifier. This adjustment process provides more accurate class quantity estimates, making ACC valuable for scenarios where precise quantification is needed while accounting for class assignment uncertainty.

In monitoring attendance, quantification can be used to estimate the size of groups (e.g. according to demographic data such as male/female, adults/children, etc, or indeed any defined class as part of a classification approach). Recently, there has been a sustained effort to provide open-source frameworks allowing to extend, deploy, and evaluate different approaches of quantification (Moreo 2023). Research in the breadth phase concludes that quantification methods can improve the accuracy of attendance estimates based on aggregating classifier outputs, such the User Attendance Classifier approach developed by de Lira et al. (2019).

Social media data: key findings

Possibilities:

Access to data. In recent years, social media platforms have been tightening access controls to their APIs, either through creating stricter access policies in the case of Meta, or creating highly expensive pricing structures for access in the case of X (formerly known as Twitter). This restricts the social media data available for analysis today compared to previous years. This is also having a knock-on effect on the deployable types of analytics. Accessing data through third party aggregation providers such as Pulsar allows a route to access data in manners that align with the social media platform rules and reduces privacy concerns.

Classification-based attendance. Not all users of social media will make explicit posts about events/activities they have participated or engaged in. Much social media use is passive, where users read content without engaging or interacting with the platform. Moreover, not all those making explicit posts about an event/activity on social media are in physical attendance. Classifiers offer a possibility for determining the actual attendance of individual users at an event, among all those who have made explicit posts about the event, to some degree of user-level accuracy. Learning to quantify can offer improved modelling, but quantification has not been used for attendance prediction before. Moreover, getting attendance of such events from social media is limited by the demographics of social media users, and the extent that attendees engage in social media. Earlier work (de Lira et al., 2019) allowed the identification of 10,000 attendees from the 70,000 (ticketed) attendees. In terms of demographics, metadata allowed the identification of the hometown of 3,500 users.

Practicalities:

Limited Geolocation Data. Due to user privacy concerns and the default configuration of the platform, geolocation is rarely present (e.g. only 1-2% of posts on X contain geolocation information). If it is present, it may be inaccurate. Geolocated user data is much easier to obtain from mobile app data.

Social media data will not be weighted because the demographics of those using certain social media platforms is highly skewed from the UK adult population to the point to which weighting the data would risk distorting it too much from the observed values. Furthermore, the demographics of who is more likely to post or upload images about an event is largely unknown. This means that it would be extremely difficult to calculate accurate sample weights. The limitations of any analysis arising from social media will be clearly stated.

Breadth phase recommendations.

Prioritising access to social media data from Pulsar, who provide aggregated access to a few popular social media platforms at reasonable cost.
As simply identifying social media posts by keywords can lead to over-estimation of attendance, applying the User Attendance Classification (UAC) methods proposed by de Lira et al (2019) is necessary for identification of those who posted and attended separate from those who posted and did not attend. It can also offer insights into, for instance, the hometown of some users.
Prioritise adopting the Adjusted Classify and Count (ACC) quantification method, and updating the classifiers used in UAC to encompass recent advances with contextualised language models.
Given the potential of social media, a case study in social media should be pursued, to test overall feasibility, allowing to advise event organisers whether such an approach is feasible or not as of 2024.

7.2 Experimentation

Social media monitoring companies, such as Pulsar, offer a unified way to access social media posts from several platforms, such as Facebook, X (previously known as Twitter) and Instagram. This avenue for access was prioritised to support depth phase testing of social media modelling due to the ease of access and cost-effective package offered by Pulsar. Full details on prioritisation of this data source are available in Annex 1. This section of this report will summarise the approach and results of tests run using Pulsar social media data. More detailed findings, including detailed methodological steps, can be found in the Pulsar case study (Annex 4).

A methodology was developed to test predicting attendance at three sporting and cultural events using Pulsar social media data. These test events were:

Scotland vs New Zealand Autumn International Rugby Fixture, Murrayfield, Edinburgh, 13th November 2022.
Farnborough International Airshow, Farnborough International Exhibition & Conference Centre, 22nd July 2024 – 26th July, 2024.
Lewes Bonfire, Lewes, 5th November 2024.

These events were selected to:

address both sport and cultural interests
address events of interest to the general public (e.g. sport or fireworks) versus those with more industrial interest (c.f. the tradeshow elements of Farnborough International Airshow)
test differing levels of virtual vs. physical participation – for instance, social media engagement with Farnborough International Airshow was anticipated to be by attendees, while the rugby would have significant social media commentary from fans that did not physically attend the game (e.g. watched on TV).
test an un-ticketed event (Lewes Bonfire night).

This approach was tested using X (formerly known as Twitter) data, as X had the widest data accessibility. Posts on the platform are public, and can be identified by searching for keywords, user account names and hashtags. In contrast, Facebook and Instagram posts can only be identified when they are made searchable by the addition of hashtags.

The chosen events were analysed by classifier models as recommended at the conclusion of the breadth phase. This is an approach that predicts what class (e.g. attendee or not) an instance (i.e. social media post) belongs to, to predict if the posts were actually about the event and if the post came from in-person attendees. A labelled dataset was developed for each event (around 200 posts for each event) to determine the accuracy of the classification approaches, which was used to estimate the quantity of attendees present on social media and perform further analysis. Multiple human assessors were used to create the labelled dataset, and analysis was performed on the extent that the assessors agree on the labels.

The results of the testing of this approach are summarised in section 7.3 below.

7.3 Pulsar case study results

The results set out in this Section (7.3) are a summary of the full findings from testing set out in the Pulsar case study Annex 4.

7.3.1 Collecting and labelling event data

For the three selected test events, around half of the posts were unrelated to the events – for instance, there was a Woman’s New Zealand rugby event that took place the day before. Some other selected tweets were observed to be concerned with issues related to race; for example, some tweets used the phrase “all blacks” to discuss issues related to race rather than the long-standing nickname for the New Zealand men’s rugby team. The presence of posts not relating to the events was not concerning, as the manual queries produced to download the data from the Pulsar platform were intended for Recall rather than Precision^{[footnote 5]}.

A high virtual participation on Twitter for the Scottish rugby event was also noticeable (see Table 7.3.1), with only 15% (15/98) of event-related posts being clearly from match attendees. In contrast, Lewes is ~25% (32/126) and 50% for Farnborough International Airshow (53/107).

Table 7.3.1: Statistics of the 200 labelled social media posts for each event

Event	Posts About Event	Posts Attended Event	Total Discordance Between -Assessors %	About Event Discordance Between Assessors %
Farnborough International Airshow	107 (54%)	53 (26%)	2 (2.5%)	1 (2.5%)
Scottish Rugby Autumn Internationals	98 (49%)	15 (8%)	2 (2.5%)	2 (5.0%)
Lewes Bonfire Night	126 (63%)	32 (16%)	3 (3.75%)	2 (5.0%)

The final two columns (within Table 7.3.1) concern the agreement between assessors, who were responsible for manually labelling training and evaluation data. In general, the assessors agreed on both event-related and attendee judgements. Such minor levels of disagreement are unlikely to affect the use of this labelled data for choosing the most effective classifiers (Voorhees, 2000).

7.3.2. Comparing event attendance classifiers

A range of classifier models were tested and evaluated for their accuracy across the three test events using the labelled datasets created. They are compared against each other using a range of classification measures in Table 7.3.2.

Table 7.3.2: Comparison of classification accuracies for the three events. The highest values in each column are bolded.

Model	F1	Precision	Recall	Accuracy	Balanced Accuracy
Farnborough International Airshow
Gradient Boosting Classifier - Transfer Learning	0.59	0.83	0.45	0.83	0.71
Gradient Boosting Classifier	0.59	0.83	0.45	0.83	0.71
Setfit	0.62	0.53	0.73	0.75	0.74
Llama3 8B Instruct zero shot	0.72	0.64	0.82	0.83	0.82
Llama3 8B Instruct few shot	0.54	0.38	0.91	0.58	0.68
Scottish Rugby
Gradient Boosting Classifier - Transfer Learning	0	0	0	0.90	0.49
Gradient Boosting Classifier	0	0	0	0.90	0.49
Setfit	0.21	0.12	1	0.43	0.69
Llama3 8B Instruct zero shot	0.67	0.50	1	0.93	0.96
Llama3 8B Instruct few shot	0.21	0.16	1	0.43	0.69
Lewes Bonfire Night
Gradient Boosting Classifier - Transfer Learning	0.22	0.50	0.14	0.83	0.58
Gradient Boosting Classifier	0.40	0.67	0.29	0.85	0.57
Setfit	0.35	0.22	0.80	0.63	0.70
Llama3 8B Instruct zero shot	0.44	0.50	0.40	0.88	0.67
Llama3 8B Instruct few shot	0.26	0.15	1	0.30	0.60

Gradient Boosted classifiers were uniformly among the lowest performing classifiers. In particular, they did not identify any of the attendees in the test dataset for the Scottish Rugby event. Performance was better for the other events, but the highest recall of event attendees was only 29% for the Lewes event, and 45% for the Farnborough International Airshow (notably, precision was reasonable for Farnborough International Airshow). Accuracy using the original model (transferred from the festival training data of de Lira et al. (2019)) showed no difference from training with the new event specific data.

Setfit language model-based classifier was generally better than the gradient boosting classifiers (providing the highest performance for Lewes) in terms of Balanced Accuracy. For instance, for Farnborough International Airshow, it identifies 73% of all the attendees’ posts in the labelled data, but in doing so identified 47% of posts from non-attendees (i.e. 1.0 – Precision).

Several Large Language Model (LLM) classifiers were also examined. The zero-shot classifier was generally the best event attendance classifier of all those examined (highest Balanced Accuracy on two events, and only marginally lower than Setfit on the Lewes labelled data). Comparing between the zero- vs. few-shot Llama3 instantiations, it was a surprise to observe that the zero-shot classifier provided the highest accuracy. Indeed, in-context learning for LLMs is a widely used effective technique in the recent literature, and its lower accuracy was unexpected here. This is likely a result of an insufficiently well-tuned prompt input, resulting in outputs from the few-shot approach that failed to provide consistently clear labels. A larger, more capable LLM and further experimentation with prompt inputs is likely to improve results here, which will be experimented with in Strand 2 of this study. In particular, some of the failures of the classifiers were identified:

The classifier may struggle to separate the Scotland-New Zealand (men’s) rugby event from another women’s rugby event taking place around the same time – for example “Congratulations, to [name] who represented Scotland, in the final of the Women’s Rugby World Cup in New Zealand, this morning. She had a great game” was classified positively.
For the Farnborough International Airshow, false positives may hint at attendance by others, rather than the poster themselves – e.g. “[company name] will be attending the Farnborough International Airshow, designed to pioneer the commercial space age, starting July 22nd. Our leaders, including our CEO, @[handle], and our COO, @[handle], will be present at the event and are looking forward to meeting you.”.
Finally, for Lewes, false negatives were observed, in that the classifier might not have enough evidence from the text of the posts, but had it analysed the associated media (images or videos) – which the human labellers had access to – the classifier might have been able to make a correct positive prediction. An example false negative in this category was “Amazing work as always Lewes 👏🎇🔥🎉 https://t.co/XXXX” (which had photos of fireworks and effigies that were burnt on the bonfire).

Overall, the application of the Llama3 zero-shot classifier was recommended, which exhibited the highest accuracy across the datasets.

7.3.3 Estimating event attendance

Using the highest accuracy model tested (Llama3 zero-shot classifier), the full, non-training datasets obtained from Pulsar were used to estimate attendance and are compared with observed attendance data (i.e. baseline data) in table 7.3.3 below:

Table 7.3.3: Attendee predictions

Event	Attendees	Total posts identified from X	Of which predicted attendees (Llama3 zero-shot)	Attendee prevalence from social media
Farnborough International Airshow (ticketed)	100,385	8,285	2,768	2.76%
Scottish Rugby (ticketed)	67,144	6,356	361	0.53%
Lewes Firework (unticketed)	40,000	1,277	220	0.55%

Table 7.3.3 demonstrates the strongest signal was for the Farnborough International Airshow event – indeed, 2-3% of attendees posted on X about the Airshow, while 0.5% of attendees for both the Scottish Rugby and Lewes Firework events were detected.

Learning-to-quantify (aka quantification), outlined in section 7.1.1, is one approach for improving quantification estimates. At its heart, quantification (in this case adjusted count methods), allows for more accurate total counts of attendance through accounting for the error in the initial classifier. These adjusted classifiers can then be used for obtaining more accurate counts on the larger dataset.

Unfortunately, learning-to-quantify has only been tested on traditional supervised machine learning techniques. It should be adaptable to those based on language models (e.g. Setfit), however obtaining posteriors from a generative LLM is more challenging^{[footnote 6]}. The feasibility of integrating the output of the LLM with the quantification methods will be explored further in Strand 2, and instead the learning-to-quantify methods on the gradient boosted (GB) classifier was applied, even though it is less accurate than the LLM.

Table 7.3.3.1 below reports the result of the quantification experiments. Firstly, the number of labelled attendees (and the prevalence of attendees in the training data) is reported. Then, for the complete datasets (i.e. all English social media posts collected for each event), the number of posts is shown, as well as the number of predicted attendees for each event, according to firstly the LLM zero-shot classifier, as well as the GB classifier – these follow the traditional “classify-and-count” paradigm. The final column reports the estimates of the GB classifier after they have been adjusted by the adjusted count quantification method. Notably, while the GB classifier produces lower estimates (expected, as its Recall is lower), applying the quantification methods leads to higher estimates, more aligned with the ground truth number of attendees.

It can be concluded that adjusted count quantification has promise in adjusting estimates obtained using classifiers, but more R\&D is required to apply it to the latest accurate LLM-based classifiers.

Table 7.3.3.1 Attendance quantification estimates – in terms of number of users – for each of the three events. Numbers in parenthesis are prevalence in the corresponding dataset.

	Training data			Compete dataset
Event	True counts	True proportion	Number of posts	LLM Predicted Classify-and-Count	GB Classifier Classify-and-Count	GB Classifier Adjusted Count
Farnborough International Airshow	11 (29%)	29%	8,285	2,768 (33%)	1496 (18%)	3976 (48%)
Scottish Rugby Autumn Internationals	3 (7%)	7%	6,356	361 (5%)	187 (3%)	445 (7%)
Lewes Bonfire Night	5 (13%)	13%	1,277	220 (17%)	83 (6%)	178 (14%)

Finally, the concordance between the classifiers was analysed. As previously noted, in the labelled datasets, labels were produced for both “aboutness” (i.e. the post was about the event) and “attended”. For a post to be labelled as an attendee, it had to be about the event. These two levels of human labelling allow the development of independent classifiers for About and Attended, and the application of these to all obtained social media posts. Therefore, the concordance between these classifiers can be considered. Results are reported in table 7.3.3.2 below – specifically, the number of unique users and percentage of users, for “About”, “Attended”, and both “About and Attended”. From the table, the prevalence of About posts for each event was about 23-26% - this emphasises the usefulness of classifying posts for relevance, and not relying on over-tuning the queries used for selecting social media posts through Pulsar.

However, there were also a proportion of posts predicted as ‘attended only’, which were not also been predicted as ‘about event’. This is most likely due to false positive predictions on posts predicted as ‘attended event’ - such as those detailed above with Scottish rugby - or false negative predictions from the classifier on posts predicted as ‘about event’.

The values for LLM-predicted results were roughly equal to the combined totals of ‘about and attended’ and ‘attended only’ shown in Table 7.3.3.2. However, for the former, this resulted in a slightly lower value as in both cases the number of unique users was being counted, in Table 7.3.3.1 this was counting unique users authoring post classified as attending, however in Table 7.3.2.2 this group was split into those posts classified as both about and attending, and only attending. This indicated that there is a sub-set of users who have written multiple posts about the event, some which were classified as both ‘about and attended’, and others which are classified as ‘attended only’. In these cases, those users were double counted in Table 7.3.2.2.

Table 7.3.3.2 Number of unique users about, and number of attending unique users, as well as the intersection (LLM zero-shot classifier).

Event	About only	About only %	About and attended	About and attended %	Attended only	Attended only %
Farnborough International Airshow	2133	25.75	1719	20.75	1388	16.75
Scottish Rugby Autumn Internationals	1471	23.14	185	2.91	193	3.04
Lewes Bonfire Night	328	25.69	124	9.71	106	8.3

7.3.4 Analysis of estimated event attendance

Below details an analysis for each of the three events, demonstrating some of the value in using social media for attendance monitoring. This included analysis of the friend/follower ratio of those posting, and their home location.

Farnborough International Airshow

Figure 7.3.4.1 shows the distribution of home countries from which the posts have originated, based on the stated location of the user in their X profile. As can be observed, while the UK is most frequent, the US also has a high predominance of predicted attendees.

Figure 7.3.4.1: Distribution of Predicted Attendees by Country - Farnborough International Airshow

To explain the high number of attendees from the US, it was hypothesised that the airshow is not just a public event, but also an industrial tradeshow, with many industrial attendees. Indeed, Farnborough International Airshow reports^{[footnote 7]} that it had visitors from 114 countries, members of the media from 56 countries, and exhibitors from 41 countries.

With the industrial nature of the event in mind, the follower/friend ratio of the predicted attendees was examined and compared across all three events. This is shown in figure 7.3.4.2 This shows most attendees have approximately the same number of followers as the users they follow, and there are more accounts with larger numbers of followers predicted as attendees of the Farnborough International Airshow event. The difference in the follower/friend ratio between Farnborough International Airshow and both the Scottish rugby internationals and Lewes bonfire night events was found to be significant (p = 0.000, P=0.000) whereas the difference between Scottish rugby internationals and Lewes bonfire was not found to be significant (p=0.025; Bonferroni correction). This emphasises the likelihood of many industrial attendees, who will have accounts with a large number of followers.

Figure 7.3.4.2: Follower Friend Ratios for predicted attendees for each event.

Scottish Rugby

Figure 7.3.4.3 below shows the country distribution of the predicted attendees at the Scotland-New Zealand Rugby Game. As expected, the majority of posts are from the UK, the next most frequent country was the United States, followed by New Zealand (17 predicted attendees). The latter is not unexpected – e.g. either players or fans that would have travelled to attend the game.

Figure 7.3.4.3: Number of Posts indicating Attendance at Scottish Rugby Autumn International, by Country

Table 7.3.4.1 shows the distribution of locations of predicted attendees within the UK for the Scottish rugby event – only locations mentioned by more than 1 users are shown; a further 39 locations were mentioned only once. Edinburgh^{[footnote 8]} (the location of the game) appears most frequently (44), followed by Glasgow (12). Other Scottish towns include some with good transport links to Edinburgh (e.g. Aberdeen, Falkirk, Inverkeithing, Dundee, Dunblane, Dunbar, St Andrews) as well as others that are quite distant from Edinburgh (Peterhead, Stranraer, Ayr and Prestwick). Towns known for their rugby focus on the Scottish Borders are also mentioned (Hawick, Melrose). In other cases, unitary council or regions are mentioned (e.g. South Ayrshire, Scottish Borders, Moray, Highland). Finally, it is argued that the distribution of followers/friends is similar to that of Lewes rather than Farnborough International Airshow, suggesting most accounts are personal in nature.

Table 7.3.4.1: All locations mentioned more than once in the predicted Scottish rugby attendees.

Location	Number of posts about event	Number of posts attended event	Location	Number of posts about event	Number of posts attended event
No City Information	373	56	Newcastle upon Tyne	8	3
Edinburgh	125	39	Alva	2	2
Glasgow	78	12	East Lindsey	1	2
London	91	9	Goring	3	2
Aberdeen	18	6	Hawick	3	2
Cardiff	23	6	Kirkwall	11	2
Dundee	8	6	Melrose	1	2
Moray	3	6	Monifieth	1	2
Scotland	5	6	Oxfordshire	2	2
City of Edinburgh	26	5	Paisley	1	2
County Durham	5	5	Perth	4	2
Fife	7	5	Saint Andrews	1	2
Birmingham	7	3	South Shields	2	2
Inverness	2	3

Lewes Bonfire 2024

Table 7.3.4.2 below provides information about the predicted attendees, as obtained from the users’ profile information. The most frequent location is Lewes itself, followed by London (1 hour by train) and Brighton (10 miles, 17 minutes by train). Again, from Figure 7.3.4.2, it was observed that the distribution of followers/friends is distinct from the industry-focussed Farnborough International Airshow event.

Table 7.3.4.2: UK locations of predicted attendees at Lewes fireworks.

Location	Number of posts about event	Number of posts attended event	Location	Number of posts about event	Number of posts attended event
Bath	1	1	Leeds	2	1
Belfast	2	2	Lewes	51	23
Brighton	29	20	Lewisham	1	1
Bristol	2	2	London	39	21
City of London	2	1	Margate	1	1
East Sussex	9	3	No City Information	83	46
Eastbourne	3	1	Poynings	29	8
Eccleshall	1	1	Staffordshire	2	2
Hertfordshire	1	1	Wealden	4	2
Hove	1	1	West Sussex	6	2

7.2.5 Recommendations

Given the promising nature of the results from the testing phase in capturing engagement across the three test sites chosen for the social media data category, the recommendation is that these testing outcomes are applied in Strand 2. This further experimentation will support development and determine how useable or otherwise the developed approaches will be.

Further experimentation in Strand 2 should include the following steps for development throughout the analysis process:

Diversify social media data sources. Facebook and Instagram are trickier to access and will be investigated further in Strand 2. Alternative sources could also include Tripadvisor or Facebook public pages, which may provide access to other demographics.
Prioritise learning a general classifier to work across events. This could be done by combining all human labelled annotations available from de Lira et al (2019) and the three test events. This is important given the limited training data available in this project. A small language model-based classifier, SetFit, already demonstrated improved attendance accuracy over the gradient boosting classifiers of de Lira et al (2019).
Explore utilising photos/images posted to support classification accuracy. The potential of this kind of processing has been enhanced in recent years by the advent of multi-modal LLMs such LLaVA (Large Language and Vision Assistant)^{[footnote 9]}. A simple integration may be to append social media posts with a textual description of the attached media [“photo of fireworks”], such that the classification LLM can take this information into account.
Prioritise utilising LLM-based approaches given the limited training required. They can be improved through longer development of the input prompt for the LLM (known as prompt engineering). This can be carried out manually or through Dspy^{[footnote 10]}, which is a promising tool that can automatically enhance prompts using some labelled training data.
Consider applying learning to quantify across a range of events. This may give better estimates of prevalence. It was not possible to apply learning-to-quantify on the LLM classifier at this stage, as this is more challenging than with the conventional classifiers. It may be possible to reformulate the LLM classifier to give a posterior likelihood for each prediction.

Question 9

8.Deployable Sensing data

Accepted Answer

8.1 Breadth phase overview

Deployable sensing data is data collected by sensors that are deployed to a specific event or space in a bespoke manner to monitor attendance and audience size. Deployment is specific to the event, with collaboration with event and space managers essential to successful use. This type of sensing analysis to capture attendance can be categorised in two main groups:

Visual data: deployed to count crowd sizes in a specific place and point in time. This group primarily includes camera imagery (including CCTV footage or images), drone footage or satellite imagery
Signal data: deployed to count crowd sizes in a specific place over a range of time periods. This group includes radar data, wi-fi signal sensing and radio signal sensing as prominent data sources and approaches

While appearing to be a promising potential data source for accurately capturing the number of attendees in a range of locations, particularly by utilising open-source crowd-counting models in conjunction with event photography, deployment is likely to be a challenge. This owes much to the fact that these data sources rely on bespoke, localised and case specific set-up and deployment which requires close collaboration and support from event and space managers. Some of these data sources also come with ethical and legal concerns that will need to be closely monitored on a case-by-case basis, again in collaboration with event or space managers.

While crowd-counting approaches using imagery appear promising as a prioritised source for development, the Breadth phase evidence synthesis also highlighted other types of sensing data including monitoring wi-fi and radio signals that are not recommended for further exploration. This is because of their complex setup requirements, which creates limitations around deployment at scale alongside the concerns identified in the evidence synthesis around accuracy and reliability of the estimates produced.

8.2 Identified modelling approaches

8.2.1 Modelling with visual data

Visual data, like drone images, is very commonly used to count large crowds in a manual way, such as using the Jacobs crowd formula (Choi-Fitzpatrick and Juskauskas 2015). However, recent advances in computer vision provide an opportunity to automate this process and estimate crowd size across many events and images at once. Towards this end, machine learning models can be trained with annotated pictures of crowds to calculate crowd size in new pictures with a high degree of accuracy.

Convolutional Neural Networks (CNNs) learn to detect patterns and features in images by processing the image in chunks, inspired by how a brain processes visual signals. CNNs are used to count objects first trained on a dataset with annotations for the relevant objects, in this case people, by learning to recognise patterns and features relevant to said objects. After training, these models can be used in new images to provide a count prediction. These models have dominated testing, with ShanghaiTech (Zhang et al 2021) and UCF-QNRF (Idrees et al 2018) as two particularly popular open-source crowd counting datasets for training and testing crowd counting models. Some of the top-performing models trained and tested on these datasets include P2PNet (Song et al 2021) and GauNet (Cheng 2022 et al 2022), both of which are commercially available open-source projects. Following testing, their performance at counting crowds in photos taken from the crowd or an elevated point of view is good enough that they may be able to be used as-is in the use case for this project. Their open-source nature and proven track record of use mean these approaches should be prioritised for further development. They may be preferable to modelling with signal data approaches (section 8.2.2) which have more complex deployment considerations.

The field of machine learning to count crowds in images is very active, with a wide variety of effective methodologies constantly being refined and new ones being created, making horizon-scanning beyond this project’s scope important in future. For instance, AWCC-Net (Huang et al 2023), which promises improved performance in adverse weather and comparable performance to GauNet in other conditions, was made open source in October 2023. Transformer-based approaches, which use a model architecture like that of current Large Language Models like GPT, have also attracted interest, including the CLTR model (Liang et al 2022). Their performance still slightly trails behind CNN-based methods but is still robust and close enough to warrant further exploration.

All the models referenced previously in this report have been trained on images taken from ground level or slightly elevated positions, and not from high-altitude drone imagery. Consequently, they perform extremely poorly when applied to high-altitude drone imagery. While trained models and datasets for crowd counting using drone imagery exist, such as STNNet/DroneCrowd (Wen et al 2021), most are not available for commercial use and, as such, access to these resources is limited. DLR-ACD (Bahmanyar et al 2019) is the only drone imagery dataset that can be licensed for commercial use and is free to use for internal research purposes. Initial testing obtained results comparable to those in Bahmanyar et al. 2019 by retraining P2PNet on this data, although the model is still less precise than with non-drone imagery. Further testing, including with other models like GauNet, can further improve the accuracy. Subject to availability, drone imagery could still be an important area for study in this project.

Similar to drone imagery, satellite imagery could also capture an entire crowd at an event using a single image. These images could potentially be used in conjunction with machine learning models that are similar to those that have been trained to count crowds using more traditional photography.. However, there are both technical and feasibility issues with the satellite imaged-based approach. Free-to-access satellite imagery data provided by Copernicus’ Sentinel-2 (Copernicus 2024) is only updated every 5 days for each point on Earth at a 10-metre resolution. Commercial products are updated more frequently, including Maxar’s SecureWatch (Maxar 2024), which is updated daily at a 30-centimetre resolution. However, this very low frequency means that the satellite may not capture images at the time an event takes place. Moreover, the resolution of satellite imagery is much lower than that of drone imagery, which would yield worse results when analysing the image with a CNN. As such, using satellite imagery was de-prioritised for further testing in this study.

Overall, using computer vision for crowd counting is promising and an avenue that should be prioritised for further testing in this study, in conjunction with use-cases where the entire audience can be captured at the same time. Taking images of crowds to calculate attendance is a suitable method thanks to the accuracy of the top-performing open-source models. Drone imagery, which can capture the entire audience of a large open-air event at once, may also be able to be used, subject to the suitability of drone footage available for testing.

8.2.2 Modelling with signal data

Deployable sensing methods include those that collect data through radio sensors, including Wi-Fi sensors, installed in the event space and analyse it to derive attendance. Analysis of signal strength and the devices connected to a network can be used to calculate crowding information in both small and large-scale environments, although there are several limitations.

One method proposed in the literature is to employ ‘Wi-Fi sniffers’ to sense pedestrians in large areas by examining wireless network connectivity. ‘Wi-Fi sniffers’ are dedicated hardware that can capture the number of devices making connection and probing requests to a given Wi-Fi network, without directly monitoring individual online usage or identifying individual users personally. Hao et al (2015) were able to survey an area of 4,000 square metres with 14 Wi-Fi sniffers, obtaining an acceptable mean absolute error of 10.5 for crowds in the low hundreds. However, this method only counts people that are carrying a mobile device which is Wi-Fi enabled and as such, it will not count people without a phone - providing a biased count for events that attract a public with a smaller mobile phone penetration rate, like those targeted at children or an older audience. Moreover, as Li et al (2020) describe, some people may have more than one Wi-Fi-enabled device, while some transmissions may not be detected if the user passed too quickly through the area, which would further muddle the counts. Furthermore, as this method has only been tested with small and sparse crowds in the low hundreds, testing would be necessary to see how this method scales up to larger and extremely dense crowds.

Sensor methodologies that do not require the audience to carry a mobile phone device also exist. An example is CrossCount (Khan and Ho 2021), which analyses channel state information, i.e. information on how Wi-Fi signals are transmitted from the network access point to connected devices, to count the number of people present in a room. This method has, it is understood, only been tested indoors with very small crowds, limiting its usefulness for the use case for this project. CrowdScan (2024) is a commercial offering that uses the received signal strength indicator of radio signals to count the number of people in an area. Their most recently published results (Denis et al 2021) indicate an error for a crowd of 3200 people of around 300 people with a 95th percentile error of 900. This suggests that this technology is suited at tracking crowd density rather than calculating exact counts. Moreover, they require installing the sensors themselves and training the algorithm with manual counts, making its ease of deployment at multiple events more complex.

Other commercial providers offer camera-based sensors to count footfall. However, to be able to measure footfall at an event accurately, the flow of people in and out of an event would have to be heavily restricted to ensure they all are passing through the installed sensors. As such, using these providers would only be adequate for events in indoor areas with clearly specified entry and exit areas.

Overall, all these sensor methods suffer from the same limitation of needing to be deployed in a bespoke manner for each event. This raises cost, complexity, and deliverability, limiting their usefulness to estimate events’ attendance at a large scale. There also are accuracy concerns related to these methods. As such, prioritising this modelling approach is not recommended.

8.2 Deployable Sensing Date: Evidence synthesis key findings

Possibilities:

Quality of data source deployment. The accuracy of any methods using this data is highly dependent on the quality of the deployment. Indeed, if the sensors are not deployed correctly, the captured data may be incomplete, which will lead to inaccurate counts. Ensuring these deployable sensing data sources have been set up correctly will therefore be extremely important.

Open-source models. Open-source models for crowd counting are very powerful and ready-to-use for large scale deployments. Being able to retrain these models, like tuning a model to count people from high altitude drone images, is also a big advantage for this project. Building a model from scratch would take too long and potentially yield a worse result than using these open-source models, so using open-source models while retraining them is recommended.

Accuracy of radio sensing methodologies: Methodologies that use sensors of radio signals to count crowds have shown some promise in the literature to be able to count crowds of different sizes. However, their accuracy is currently not high enough and some methods have not been tested in crowds of larger size.

Practicalities:

Responsibility of data capture. With other sources the challenge of capturing data lies with the data provider. With these methods the responsibility falls on data analysts and the event organisers. While this offers some element of control, it also adds more logistical complexity and could impact the deliverability of the related methods.

Complexity of deployment. The complexity of deployment varies widely depending on the type of data source used. For example, while conventional imagery and drone imagery would be analysed using similar machine learning models, capturing drone imagery may require a specialist to fly a drone, while conventional imagery may be taken by staff of the event. Sources based on sensing radio signals are more complex to set up compared to other deployable data sources, which negatively impact their deliverability.

Breadth phase recommendations

Radio-based methodologies are not accurate enough to estimate attendance and have ease of deployment issues. As such they are not recommended for further exploration.
Despite complexity in obtaining crowd images, image-based approaches using machine learning models to count crowds in images should be prioritised with open-source models showing promising accuracy in the literature examined to date.

Question 10

9. Event and Space data

Accepted Answer

9.1 Breadth phase overview

Event and space data covers a broad collection of demographic, population and geographic datasets. These administrative datasets are ‘static’, meaning the data they hold is captured at a specific point in time relating to a specific event or location. For ‘event’ data, this can include information about attendance collected through event promotion and management platforms such as Meetup or Eventbrite. ‘Space’ data, on the other hand, refers to population representative demographic and geographic datasets such as OpenStreetMap, which can be used for mapping and obtaining relevant points of interest for geospatial data, or the Census, which can be used in conjunction to retrieve estimated demographics of space attendees.

While challenging to utilise in isolation to estimate attendance, event and space datasets can provide valuable background information about the context in which other data sources have been collected, including providing specific event/location boundaries or Points of Interest, or estimated demographics for a given area of interest where an event is being held.

As event and space data sources are ‘static’ (i.e. collected at a fixed point in time), they are not able to count attendance in isolation, as they cannot track a space’s attendees over time. However, they can be used with other data sources to improve depth or accuracy of attendance and engagement estimates.

Event-specific data could be used to obtain ground truths for ticketed events, as a data source to estimate attendance, and potentially to get demographics of sports events. However, due to problems relating to data access, none of the event-specific are recommended to be pursued during this project. Several potential event data providers identified in the Breadth phase, such as Facebook Events for example, have discontinued their public APIs making this data unavailable to access.

9.2 Identified modelling approaches

Combining with other novel datasets

Event and space data is unlikely to be useful in estimating attendance in isolation given its static and often non-specific nature. Censuses, for example, provide very detailed demographic information on the population, but do not offer any information on attendance at specific events or spaces. They are likely, therefore, to be best deployed in conjunction with or combined with other novel data categories identified in this research programme to either support estimating attendance or provide further analysis of engagement or attendance. This has the potential to be particularly helpful where the primary novel dataset being used in analysis (mobile app, social media etc) does not include detailed demographic information. Depth phase testing in this project has highlighted the potential of combining ‘space’ data with mobile app data as an example of how this data can be utilised. Geographic information from OpenStreetMaps, accessed through an API, has been used to isolate specific points of interest for analysis using mobile app location data.

This project has also included testing of combining ‘space’ data with mobile app data to derive demographic trends and characteristics. Demographic information from the Index of Multiple Deprivation combined with a mobile app user’s estimated home location could be used to create demographic estimates, with full details set out in annex 2. While this combination of datasets shows potential, this is a highly experimental area of analysis that requires extensive future testing and research beyond the scope of this project to assure accuracy.

Another useful type of ‘space’-based demographic data to gain insights into event attendance are geo-demographic segmentation data. These data are derived from a variety of government and commercial sources and offer more detailed demographic information at a higher spatial resolution, using the postcode. Two popular data sources widely used for detailed demographic segmentation are CACI’s ACORN data and Experian’s Mosaic. These datasets reveal common demographic characteristics which provide insights like those from the IMD or SIMD but with more detailed segmentation.

‘Event’-based data also plays an important role in combination with primary novel datasets. Parkrun event data, for example, was used in conjunction with Strava data as a baseline to support better accuracy in the development of an activity data model for estimating attendance at local runs in Edinburgh. Attendance information available through event-planning and ticketing platforms, for example, could have the potential to fulfil a similar role, if available in the future.

Therefore, as demographic information is generally lacking in other data sources, the recommendation is to use event and space data in conjunction with other modelling approaches to obtain either demographic information on the audiences of interests by using space-based data, or improved accuracy of attendance estimates using event data.

9.3 Event and Space Data: Evidence synthesis key findings

Possibilities:

The value of censuses for demographics. Static data can be used to fill the demographic gaps present in other forms of data. Censuses provide very detailed demographic information at a very granular level that can be used to provide further insights on other data. As Scotland’s census was run the year after the censuses for England, Wales and Northern Ireland, it currently contains less information than the other censuses. It is still being released in a staggered form, but further data will be made available in 2025.

Finding points of interest in a programmatic way with OpenStreetMap. OpenStreetMap data can be used to discover relevant points of interest in a programmatic way. For example, its data can be queried through the Overpass API to create a list of local sport sites that are relevant for participation in sport and their associated information, like geographical coordinates, types of sport available at the location and whether access is private or public. Doing this manually for the entire UK would be extremely time-consuming. During initial testing, lists and geographic coordinates of tourist and sports sites for Bradford, Glasgow and Lewes have been compiled. Coordinates for individual points of interest, like Giant’s Causeway, are also retrievable through the Overpass API.

Practicalities:

Low availability of event-specific data sources: Many commercial event-specific data sources are increasingly less willing to make events API accessible for automatic access. As such, it will not be possible to use event-specific data in the next phase.

The value and variability of open-source data: OpenStreetMap is open-source and editable by anyone. This means that while new locations and changes to existing locations can be added to the dataset very quickly, sometimes data is recorded in different ways by different editors. For example, some museums in Bradford have been recorded as a single point coordinate, while others have had their entire area mapped out. This difference in geographic precision may make some crowd estimates less precise for areas that have not had their area mapped out, as an educated guess on the area covered by the point of interest would be required.

Breadth phase recommendations

Census data can be used to provide demographic information on an event or space’s audience and is recommended for use in the next phase.
Event-specific data is not readily available and, while they could prove useful to give further context on events, most are too niche to be used in the depth phase.

Question 11

Appendix 1: Terms and definitions

Accepted Answer

Definitions:

Term	Definition
Adjusted Classify and Count (ACC)	Adjusts the classifier count based on the learned true positive and false positive rates. It integrates classification and counting to estimate the class quantities in a dataset.
Aggregated data	Data that has been collected and combined from multiple individuals in a population and then is used to create a statistical report that makes inferences about that population.
API	A set of functions and procedures allowing the creation of applications that access the features or data of an operating system, application, or other service.
Baseline	Statistical information collected at the beginning of a scientific study that is used to assess the change brought about by a particular intervention.
Classification and count (CC)	The Classify and Count method is a straightforward quantification approach where models are trained to recognise classifications (i.e. types of the data in question) and then count their prevalence in a dataset.
Convolutional Neural Networks	A Convolutional Neural Network (CNN) is a type of deep learning neural network architecture commonly used in computer vision tasks such as image classification, object detection, and segmentation. CNNs are designed to automatically and adaptively learn spatial hierarchies of features from input images.
Correlation	A statistical measure that describes the relationship between two variables. It indicates how changes in one variable are associated with changes in another.
Crowdedness	The state of being filled or near capacity.
Data shift	The different distribution of data between train and test sets.
Deployable sensing technologies data	Any data that is collected by sensors and must be deployed to an event or space in a bespoke manner to monitor attendance and audience size. This type of sensing can be achieved through visual data, using cameras, or through Wi-Fi and radio signals. Sources include Wearables and Radio Frequency Identification tags (RFIDs).
Gradient Boosted (GB) Classifier	A combined machine learning model that trains each new model to address the errors from the previous models.
Granularity	The scale or level of detail in a set of data.
Ground truth	Information known to be real or true, provided by direct observation or measurement.
Large Language Model (LLM)	Large Language Models (LLMs) are a family of models which allows for a better comprehension of text by contextualising individual words within the context of a broader sentence. Due to their size and further training methods they are able to perform a variety of natural language processing tasks without the need to further training.
Mean Absolute Percentage Error (MAPE)	A measure of accuracy for models that output values on a numerical or continuous scale. For each value predicted it finds absolute difference between the true value and the predicted value as a percentage of the true value, then finds the mean over all of the predicted values in the dataset. The lower the value the better the performance of the model.
Metadata	The data providing information about one or more aspects of the data; it is used to summarise basic information about data that can make tracking and working with specific data easier.
Mobile App data	Individual-level GPS data obtained from the usage of mobile applications on GPS-enabled devices. Sources include eSIMS, Wi-Fi connections and Strava API.
Multicollinearity	A statistical concept where several independent variables in a model are correlated. Two variables are considered perfectly collinear if their correlation coefficient is +/- 1.0.
Precision	A measure of accuracy for models that output a categorical label or classification. It shows out of how many points that the model has labelled as belonging to a given class truly belong to that class. The closer to 1 the better the model performance.
Quantification	A form of supervised machine learning where the aim is to quantify or learn the total number of a class in a dataset, rather than classifying or labelling each individual item in a dataset.
Raw counts	The number of data points originally generated by a system, device or operation, and has not been processed or changed in any way.
Raw data	The data originally generated by a system, device or operation, and has not been processed or changed in any way.
Real-time	The actual time during which a process or event occurs.
Recall	A measure of accuracy for models that output a categorical label or classification. It measures out of all the points that truly belong to a class, what proportion of them did the model label as belonging to that class, the closer to 1 the better the performance of the model.
Regression model	A regression model is a statistical model that estimates the relationship between one dependent variable and one or more independent variables using a line.
Route Mean Squared Error (RMSE)	A measure of accuracy for models that output values on a numerical or continuous scale. It is the mean value of square route of the squared difference between the predicted value and the true value across the entire dataset. The lower the RMSE the better the model performance.
Sample weighting	Weighting is a statistical technique in which datasets are manipulated through calculations to bring them more in line with the population being studied.
SetFit	Setfit is a finetuning methodology for sentence transformer models for specific NLP tasks. Transformers are a type of machine learning model that is used for natural language processing. It builds on previous models by not considering each word in sequence when performing tasks but takes into context the relationships between other words in the sentence to extract greater context, sentence transformers consider an entire sentence rather than individual words.
Social media data	Data from posts/behaviour of users on social media platforms (like Twitter/X). Sources from third parties or through social media companies.
Spatial coverage	A geographical area where data was collected, a place which is the subject of a collection, or a location which is the focus of an activity.
Temporal coverage	The time period during which data was collected or observations were made.
Transport data	Data which tracks the flows of people using private (e.g., cars, bikes, walking) and public (e.g., buses, trains, public cycle hire) means of transportation. Sources include parking data, traffic monitoring and passenger numbers on trains.
XGBoost	A computationally efficient implementation of gradient boosting models, which is a type of model made up of smaller less accurate models used in sequence that are trained to predict the error of the previous model

Abbreviations:

Term	Acronym
Adjusted Classify and Count	ACC
Application Programming Interface	API
Closed-circuit television	CCTV
Convolutional Neural Networks	CNNs
Data Protection Impact Assessment	DPIA
, Department for Culture, Media and Sport	DCMS
General Data Protection Regulation	GDPR
Global Positioning System	GPS
Gradient Boosted (classifier)	GB
Independent and Identically Distributed	IID
Indices of Multiple Deprivation	IMD
Large Language Models	LLMs
Mean Absolute Percentage Error	MAPE
Natural Language Processing	NLP
Office for National Statistics	ONS
Points of Interest	POIs
Route Mean Squared Error	RMSE
Random Iterative Method Weighting	RIM Weighting
Software Development Kits	SKDs
Urban Big Data Centre	UBDC
User Attendance Classification	UAC

Question 12

Appendix 2: References

Accepted Answer

Alaiz-Rodríguez, R and Japkowicz, N. (2008) Assessing the impact of changing environments on classifier performance. Proceedings of the Canadian Society for Computational Studies of Intelligence, 21st Conference on Advances in Artificial Intelligence, Canadian AI ‘08, Springer-Verlag, Berlin, Heidelberg.

Ajao et al. (2015) A survey of location inference techniques on Twitter. Journal of Information Science, Volume 41, Issue6, pages 855-864. https://doi.org/10.1177/0165551515602847.

Bahmanyar et al. (2019) Crowd Counting and Density Map Estimation in Aerial and Ground Imagery. BMVC Workshop on Object Detection and Recognition for Security Screening. https://www.dlr.de/eoc/en/desktopdefault.aspx/tabid-12760/22294_read-58354/

Cabinet Office. (2024) Privacy Notice - Social and Digital Media Analysis: https://www.gov.uk/government/publications/privacy-notice-social-and-digital-media-analysis/privacy-notice-social-and-digital-media-analysis

Cameron, R.W.F., et al. (2020) Where the wild things are! Do urban green spaces with greater avian biodiversity promote more positive emotions in humans? Urban Ecosystems, Volume 23 Issue 2, pages 301–317. https://doi.org/10.1007/s11252-020-00929-z.

Cheng, Z-G., et al. (2022) Rethinking Spatial Invariance of Convolutional Networks for Object Counting. arXiv. https://arxiv.org/abs/2206.05253

Copernicus. (2024) Sentinel-2 Global Mosaic. https://land.copernicus.eu/en/products/global-image-mosaic

Criado-Perez, C. (2020) Invisible women: exposing data bias in a world designed for men. London: Vintage.

CrowdScan. (2024) Website: crowdscan.be How does it work. https://www.crowdscan.be/how-does-it-work

Cui, N., et al. (2021) Using VGI and Social Media Data to Understand Urban Green Space: A Narrative Literature Review. ISPRS International Journal of Geo-Information, Volume 10 Issue 7, page425. Available at: https://doi.org/10.3390/ijgi10070425.

Davies, A. (2017) Cynemon – Cycling Network Model for London. https://www.ucl.ac.uk/transport/sites/transport/files/Davies_slides.pdf

De Lira, V.M., et al. (2019) Event attendance classification in social media, Information Processing & Management, Volume 56, Issue 3. https://doi.org/10.1016/j.ipm.2018.11.001.

Dennis, S., et al. (2021) Sensing Thousands of Visitors Using Radio Frequency. IEEE Systems Journal, volume. 15, Issue 4, pages5090-5093. https://doi.org/10.1109/JSYST.2020.3019189

Deveaud, R., et al. (2015) Experiments with a Venue-Centric Model for Personalised and Time-Aware Venue Suggestion. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM ‘15). Association for Computing Machinery, New York, NY, USA, 53–62. https://doi.org/10.1145/2806416.2806484

Devlin, J., et al. (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics. https://aclanthology.org/N19-1423.pdf

Ebrahimi, M., et al. (2018) Twitter user geolocation by filtering of highly mentioned users. Journal of the Association for Information Science and Technology, Volume 69 Issue 7, pages 879-889.

Esuli, A., et al. (2023) Learning to Quantify. Springer. https://link.springer.com/book/10.1007/978-3-031-20467-8

Fabris, A., et al. (2023) Measuring Fairness Under Unawareness of Sensitive Attributes: A Quantification-Based Approach. Journal of Artificial Intelligence Research 76, 1117-1180. https://www.jair.org/index.php/jair/article/download/14033/26912/34038

Forman, G. (2006) Quantifying trends accurately despite classifier error and class imbalance. In ACM SIGKDD international conference on Knowledge discovery and data mining.

Gao, G., et al. (2020) CNN-based Density Estimation and Crowd Counting: A Survey. https://arxiv.org/pdf/2003.12783v1.pdf

Ghermandi, A. (2022) Geolocated social media data counts as a proxy for recreational visits in natural areas: A meta-analysis. Journal of Environmental Management, Volume 317, p. 115325. https://doi.org/10.1016/j.jenvman.2022.115325.

Ghermandi, A., Depietri, Y., & Sinclair, M. (2022) In the AI of the beholder: A comparative analysis of computer vision-assisted characterizations of human-nature interactions in urban green spaces. Landscape and Urban Planning, Volume 217, 104261 https://www.sciencedirect.com/science/article/pii/S0169204621002243

Ghermandi, A., et al. (2023) Social media data for environmental sustainability: A critical review of opportunities, threats, and ethical use. One Earth, Volume 6 Issue 3, pages 236–250. Available at: https://doi.org/10.1016/j.oneear.2023.02.008.

Ghermandi, A., and Sinclair, M. (2019) Passive crowdsourcing of social media in environmental research: A systematic map. Global Environmental Change, Volume 55, pages 36–47. Available at: https://doi.org/10.1016/j.gloenvcha.2019.02.003.

Gonzalez, J.D., et al. (2018) Learning to geolocalise tweets at a fine-grained level. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 1675-1678). https://dl.acm.org/doi/abs/10.1145/3269206.3269291

Guo, S., et al. (2019) Accessibility to urban parks for elderly residents: Perspectives from mobile phone data. Landscape and Urban Planning, Volume 191, 103642. Available at: https://doi.org/10.1016/j.landurbplan.2019.103642.

Hao, L., et al. (2015) On the Fine-Grained Crowd Analysis via Passive WiFi Sensing, 2015. TechRxiv. https://www.techrxiv.org/articles/preprint/On_the_Fine-Grained_Crowd_Analysis_via_Passive_WiFi_Sensing/23805942/1

Hays, R.D., et al. (2015) Use of Internet panels to conduct surveys. Behaviour Research, Volume 47, pages 685-690. Available at: https://doi.org/10.3758/s13428-015-0617-9

Heikinheimo, V., et al. (2020) Understanding the use of urban green spaces from user-generated geographic information. Landscape and Urban Planning, Volume 201, p. 103845. Available at: https://doi.org/10.1016/j.landurbplan.2020.103845.

Heo, S., Lim, C.C. and Bell, M.L. (2020) Relationships between Local Green Space and Human Mobility Patterns during COVID-19 for Maryland and California, USA. Sustainability, Volume 12, Issue 22, 9401. Available at: https://doi.org/10.3390/su12229401.

Homer, D. (2022) Blog post https://www.theeventplannerexpo.com/what-metrics-you-should-be-analyzing-with-each-event/

Huang, Z-K., et al. (2023) Counting Crowds in Bad Weather. arXiv. https://arxiv.org/abs/2306.01209

Idrees, H., et al. (2018) Composition Loss for Counting, Density Map Estimation and Localization in Dense Crowds, 2018. Proceedings of IEEE European Conference on Computer Vision (ECCV 2018). https://www.crcv.ucf.edu/papers/eccv2018/2324.pdf

Ilieva, R.T., and McPhearson, T. (2018) Social-media data for urban sustainability. Nature Sustainability, Volume 1, Issue 10, pages 553–565. Available at: https://doi.org/10.1038/s41893-018-0153-6.

Joshi, C., et al. (2023) A data science approach to estimate the use of natural spaces: a feasibility study. Available at: https://datasciencecampus.ons.gov.uk/projects/a-data-science-approach-to-estimate-the-use-of-natural-spaces-a-feasibility-study/ .

Choi-Fitzpatrick, J., and Juskauskas, T. (2015) Up in the Air: Applying the Jacobs Crowd Formula to Drone Imagery. Procedia Engineering, Volume 107, Pages 273-281, https://doi.org/10.1016/j.proeng.2015.06.082.

Molloy, J., et al. (2021) Observed impacts of the Covid-19 first wave on travel behaviour in Switzerland based on a large GPS panel. Transport Policy, Volume 104, pages 43-51.

Khan, D., and Ho, I. (2021) Efficient Device-free Crowd Counting by Leveraging Transfer Learning. IEEE Internet of Things Journal. https://www.researchgate.net/publication/360266749_CrossCount_Efficient_Device-free_Crowd_Counting_by_Leveraging_Transfer_Learning

Lee, K.-S., et al. (2021) Analysis of the Activity and Travel Patterns of the Elderly Using Mobile Phone-Based Hourly Locational Trajectory Data: Case Study of Gangnam, Korea. Sustainability, Volume 13, Issue 6. Available at: https://doi.org/10.3390/su13063025.

Li, Y., et al. (2020) A Case Study of Wi-Fi Sniffing Performance Evaluation, IEEE Access, Volume 8, pages 129224-129235. Available at: https://doi.org/10.1109/ACCESS.2020.3008533

Liang, D., et al. (2022) An End-to-End Transformer for Crowd Localization. arXiv. https://arxiv.org/abs/2202.13065

Liu, Y., et al. (2019) RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://arxiv.org/abs/1907.11692

Loureri, D., et al. (2022) TimeLMs: Diachronic Language Models from Twitter. arXiv preprint 2202.03829. https://arxiv.org/abs/2202.03829

Ma, X., et al. (2018) Predicting Future Visitors of Restaurants Using Big Data, ICMLC 2018, pages 269-274. Available at: https://doi.org/10.1109/ICMLC.2018.8526963

Madden, K., et al. (2023) Forecasting daily foot traffic in recreational trails using machine learning. Journal of Outdoor Recreation and Tourism, Volume 44. Available at: https://doi.org/10.1016/j.jort.2023.100701

Mancini, F., Coghill, G.M., and Lusseau, D. (2018) Using social media to quantify spatial and temporal dynamics of nature-based recreational activities. PLOS ONE, Volume 13, Issue 7. Available at: https://doi.org/10.1371/journal.pone.0200565.

Maxar. (2024) SecureWatch. https://www.maxar.com/products/securewatch

Mazoyer, B., et al. (2018) Real-time collection of reliable and representative tweets datasets related to news events. First International Workshop on Analysis of Broad Dynamic Topics over Social Media (BroDyn 2018) co-located with the 40th European Conference on Information Retrieval (ECIR 2018), Mar 2018, Grenoble, France https://centralesupelec.hal.science/hal-02321957/document

Mears, M., et al. (2021) Mapping urban greenspace use from mobile phone GPS data. PLOS ONE, Volume 16, Issue 7. Available at: https://doi.org/10.1371/journal.pone.0248622.

Merrill, N.H., et al. (2020) Using data derived from cellular phone locations to estimate visitation to natural areas: An application to water recreation in New England, USA. https://pmc.ncbi.nlm.nih.gov/articles/PMC7192446/

Moreo, A., et al. (2022) Tweet sentiment quantification: An experimental re-evaluation. PLOS ONE, September 2022. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0263449

Moreo, A. (2023) A Python Framework for Quantification https://github.com/HLT-ISTI/QuaPy

Nguyen, D.Q., et al. (2020) BERTweet: A pre-trained language model for English Tweets. Proc. EMNLP 2020. https://aclanthology.org/2020.emnlp-demos.2.pdf

NewsGuard. (2023) How the UK Government Used NewsGuard’s Pulsar Integration To Detect Misinformation Narratives. https://www.newsguardtech.com/insights/how-the-uk-government-used-newsguards-pulsar-integration-to-detect-misinformation-narratives/

Ofcom. (2022): Adults Media Literacy. Statistical release calendar 2023 - Ofcom

Osborne, M., et al. (2012) Bieber no more: First Story Detection using Twitter and Wikipedia. In Proceedings of the SIGIR Workshop in Time-aware Information Access. (TAIA). https://www.dcs.gla.ac.uk/\~craigm/publications/osborneTAIA2012.pdf

Richards, D. R., & Friess, D. A. (2015) A rapid indicator of cultural ecosystem service usage at a fine spatial scale: Content analysis of social media photographs. Ecological Indicators, Volume 53, pages 187-195. https://www.sciencedirect.com/science/article/pii/S1470160X15000588

Rivas, R., & Hristidis, V. (2021) Effective social post classifiers on top of search interfaces. Data Mining and Knowledge Discovery, Volume 35, pages 1809–1829. https://doi.org/10.1007/s10618-021-00768-2

Schirpke, U., et al. (2023) Emerging technologies for assessing ecosystem services: A synthesis of opportunities and challenges. Ecosystem Services, Volume 63. Available at: https://doi.org/10.1016/j.ecoser.2023.101558.

Serra, L., and Zeinullin, M. (2022) Glasgow CCTV Object Detection Counts. UBDC Technical Notes and Working Papers Series DOI: 10.5281/zenodo.7054623.

Sinclair, M., et al. (2020a) Using social media to estimate visitor provenance and patterns of recreation in Germany’s national parks. Journal of Environmental Management, Volume 263, p. 110418. Available at: https://doi.org/10.1016/j.jenvman.2020.110418.

Sinclair, M., et al. (2020b) Valuing nature-based recreation using a crowdsourced travel cost method: A comparison to onsite survey data and value transfer. Ecosystem Services, Volume 45, p. 101165. Available at: https://doi.org/10.1016/j.ecoser.2020.101165.

Sinclair, M., et al. (2021) Understanding the use of greenspace before and during the COVID-19 pandemic by using mobile phone app data. GIScience 2021 Short Paper Proceedings. 11th International Conference on Geographic Information Science. September 27-30, 2021. Poznań, p. Poland (Online). Available at: https://doi.org/10.25436/E2D59P.

Sinclair, M., et al. (2022) Valuing Recreation in Italy’s Protected Areas Using Spatial Big Data. Ecological Economics, Volume 200, p. 107526. Available at: https://doi.org/10.1016/j.ecolecon.2022.107526.

Sinclair, M., et al. (2023a) Assessing the socio-demographic representativeness of mobile phone application data. Applied Geography, volume 158, p. 102997.

Sinclair, M., et al. (2023b) Estimating Greenspace Visitation Using Digital Footprints Data: A Collaboration with Glasgow City Council to Aid in Open Space Policy and Operations. GIScience Conference 2023. Leeds, UK.

Solon, G., et al. (2015) What Are We Weighting For? Journal of Human Resources, Volume 50, Issue 2, pages 301-316. Available at: https://doi.org/10.3368/jhr.50.2.301

Song, Q., et al. (2021) Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework. arXiv. https://arxiv.org/abs/2107.12746

Tang, H., et al. (2022): A multilayer recognition model for twitter user geolocation. Wireless Networks, Volume 28, 1197–1202.https://doi.org/10.1007/s11276-018-01897-1

Tenkanen, H., et al. (2017) Instagram, Flickr, or Twitter: Assessing the usability of social media data for visitor monitoring in protected areas. Scientific reports, Volume 7, Issue 1. https://www.nature.com/articles/s41598-017-18007-4

Toivonen, T., et al. (2019) Social media data for conservation science: A methodological overview. Biological Conservation, Volume 233, pages 298–315. Available at: https://doi.org/10.1016/j.biocon.2019.01.023.

Tonga Uriarte, Y., et al. (2020) Exploring the relation between festivals and host cities on Twitter: a study on the impacts of Lucca Comics & Games. Information Technology and Tourism, Volume 22, pages 625–648. https://doi.org/10.1007/s40558-020-00185-z

Timokhin, S., et al. (2021) Predicting Venue Popularity Using Crowd-Sourced and Passive Sensor Data. Smart Cities Volume 3, Issue 3, pages 818-841. https://doi.org/10.3390/smartcities3030042

Wang, F., and Chen, C. (2018) On data processing required to derive mobility patterns from passively-generated mobile phone data. Transportation Research Part C: Emerging Technologies, Volume 87, pages 58–74. Available at: https://doi.org/10.1016/j.trc.2017.12.003.

Wen, L., et al. (2021) Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark. arXiv. https://arxiv.org/abs/2105.02440v1

Wood, S.A., et al. (2013) Using social media to quantify nature-based tourism and recreation. Scientific Reports, Volume 3, Issue1, p. 2976. Available at: https://doi.org/10.1038/srep02976.

Wood, S.A., et al. (2020) Next-generation visitation models using social media to estimate recreation on public lands. Scientific Reports, Volume 10 Issue1, p. 15419. Available at: https://doi.org/10.1038/s41598-020-70829-x.

Zander, S., et al. (2023) Bias and precision of crowdsourced recreational activity data from Strava, Landscape and Urban Planning, Volume 232. https://www.sciencedirect.com/science/article/pii/S0169204623000051#:\~:text=Strava%20users%20are%20biased%20toward,for%20ages%2035%E2%80%9354

Zhang, Y., et al. (2021) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. CVPR Open Access. https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zhang_Single-Image_Crowd_Counting_CVPR_2016_paper.pdf

Zhang, L., et al. (2023) Interactive COVID-19 mobility impact and social distancing analysis platform. Transportation Research Record, Volume 2677, Issue 4, pages 168-180.

Question 13

Appendix 3: Evidence Synthesis References

Accepted Answer

Alkouz, B., and Aghbari, Z. (2020) SNSJam: Road traffic analysis and prediction by fusing data from multiple social networks. Information Processing & Management, Volume 57, Issue 1, pages 1114-1128.

Aziz, M., et al. (2018) Automated solutions for crowd size estimation. Social Science Computer Review, Volume 20, Issue 10.

Bogaert, M., et al. (2016) The added value of Facebook friends data in event attendance prediction. Decision Support Systems, Volume 82, pages 26-34.

Bunse, M., et al. (2023) Regularization-based methods for ordinal quantification. Data Mining and Knowledge Discovery, Volume 38, pages 4076-4121.

Cecaj, A. (2020) Comparing deep learning and statistical methods in forecasting crowd distribution from aggregated mobile phone data. Applied Sciences, Volume 10, Issue 18.

Cheng, Z-Q., et al. (2022) Rethinking Spatial Invariance of Convolutional Networks for Object Counting. arXiv. https://arxiv.org/abs/2206.05253

Choi-Fitzpatrick, J., and Juskauskas, T. (2015) Up in the Air: Applying the Jacobs Crowd Formula to Drone Imagery. Procedia Engineering, Volume 107, Pages 273-281. https://doi.org/10.1016/j.proeng.2015.06.082

Chouhan, K., et al. (2022) Sentiment analysis with Tweets behaviour in Twitter streaming API. Computer Systems Science & Engineering, Volume 45, Issue 2.

Collins Bartholomew. (2023) Mobile network coverage data. https://www.collinsbartholomew.com/mobile-network-coverage-map-data/.

Data Science Campus. (2023) Using open-source data to measure our engagement with the natural environment. Using open-source data to measure our engagement with the natural environment - Data Science Campus.

Davies, A. (2017) Cynemon – Cycling Network Model for London. https://www.ucl.ac.uk/transport/sites/transport/files/Davies_slides.pdf.

De Lira, V., et al. (2019) Event attendance classification in social media. Information Processing & Management, Volume 56, Issue 3. https://doi.org/10.1016/j.ipm.2018.11.001.

Diao, S., et al. (2023) Hashtag-guided low-resource Tweet classification. Proceedings of the ACM Web Conference 2023, May 1-5, Austin, Texas, USA. https://arxiv.org/pdf/2302.10143.

Dunaway, J., et al. (2018) News attention in a mobile era. Journal of Computer-Mediated Communication, Volume 23, Issue 2, pages 107-124.

Gao, G., et al. (2020) CNN-based Density Estimation and Crowd Counting: A Survey. https://arxiv.org/pdf/2003.12783v1.pdf

Geospatial Commission. (2023) UK geospatial strategy 2030. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1162795/2023-06-15_UK_Geospatial_Strategy_2023_.pdf.

GSMA. Device attribute data. https://www.gsma.com/solutions-and-impact/industry-services/device-services/gsma-device-attribute.

Hao, L., et al. (2015) On the Fine-Grained Crowd Analysis via Passive Wi-Fi Sensing. TechRxiv. https://www.techrxiv.org/articles/preprint/On_the_Fine-Grained_Crowd_Analysis_via_Passive_WiFi_Sensing/23805942/1

Hays, R.D., et al. (2015) Use of Internet panels to conduct surveys, Behaviour Research, Volume 47, pages 685-690. Available at: https://doi.org/10.3758/s13428-015-0617-9

Huang, Z-K., et al. (2023) Counting Crowds in Bad Weather. arXiv. https://arxiv.org/abs/2306.01209

INSEE. (2020) Partial return of population movements with the end of lockdown. INSEE Analysis, no54.

Instituto Nacional de Estadistica. (2023) Distribution of the expenditure made by foreign visitors on visits to Spain. https://www.ine.es/en/experimental/gasto_tarjetas/trimestral.htm.

Instituto Nacional de Estadistica. (2023) Experimental statistics – measurement of tourism using mobile phones. https://www.ine.es/experimental/turismo_moviles/experimental_turismo_moviles_interno.htm?L=1.

Joshi, C., et al. (2023) A data science approach to estimate the use of natural spaces: a feasibility study. Available at: https://datasciencecampus.ons.gov.uk/projects/a-data-science-approach-to-estimate-the-use-of-natural-spaces-a-feasibility-study/

Khan, D., and Ho, I. (2021) CrossCount: Efficient Device-free Crowd Counting by Leveraging Transfer Learning. IEEE Internet of Things Journal. https://www.researchgate.net/publication/360266749_CrossCount_Efficient_Device-free_Crowd_Counting_by_Leveraging_Transfer_Learning

Lan, T., et al. (2022) Research on the prediction system of event attendance in an event-based social network. Wireless Communications and Mobile Computing, Volume 22, Issue 1.

Liang, D. (2022) TransCrowd: weakly-supervised crowd counting with transformers. Science China Information Sciences, Volume 65.

Li, Y., et al. (2020) A Case Study of WiFi Sniffing Performance Evaluation. IEEE Access, Volume 8, pages 129224-129235. Available at: https://doi.org/10.1109/ACCESS.2020.3008533

Mamei, M., and Colonna, M. (2016) Estimating attendance from cellular network data. International Journal of Geographical Information Science, Volume 30, Issue 7, pages 1281-1301.

Merrill, N.H., et al. (2020) Using data derived from cellular phone locations to estimate visitation to natural areas: An application to water recreation in New England, USA. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7192446/

Moore, R., and Reeves, A. (2020) Defining racial and ethnic context within geolocation data. Political Science and Research Methods, Volume 8, pages 780-794.

Matsukuma, N., et al. (2017) Using people flow technologies with public transport. Hitachi Review, Volume 66, Issue 2, pages 61-65.

Nelson, T., et al. (2021) Generalised model for mapping bicycle ridership with crowdsourced data. Transportation Research Part C: Emerging Technologies, Volume 125.

Pereira, F., et al. (2015) Why so many people? Explaining nonhabitual transport overcrowding with internet data. IEEE Transactions on Intelligent Transportation Systems, Volume 16, Issue 3, pages 1370-1379.

Ptak, B., et al. (2022) On-board crowd counting and density estimation using low altitude unmanned aerial vehicles – looking beyond the benchmark. Remote Sensing, Volume 14, Issue 10.

Sala, L., et al. (2021) Generating demand responsive bust routes from social network data analysis. Transportation Research Part C: Emerging Technologies, Volume 128.

Sanchez, L., et al. (2014) SmartSantander: IoT experimentation over a smart city testbed. Computer Networks, Volume 61, pages 217-238.

Solon, G., et al. (2015) What Are We Weighting For? Journal of Human Resources, Volume 50, Issue 2, pages 301-316. Available at: https://doi.org/10.3368/jhr.50.2.301

Song, Q., et al. (2021) Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework. arXiv. https://arxiv.org/abs/2107.12746

Taylor, S., and Letham, B. (2018) Forecasting at scale. The American Statistician, Volume 72, Issue 1, pages 37-45.

Tsai, W-L., et al. (2023) Using cellular device location data to estimate visitation to public lands: Comparing device location data to U.S. National Park Service’s visitor use statistics. PLoS ONE, Volume 18, Issue 11.

Vivacity Blog. (2020) TfL using artificial intelligence to help fuel London’s cycling boom. https://vivacitylabs.com/tfl-using-artificial-intelligence-to-help-fuel-londons-cycling-boom-2/.

Vivacity Blog. (2023) Analysing festival footfall data with Salford City Council. https://vivacitylabs.com/festival-footfall-data/.

Weidmann, N. (2016) A closer look at reporting bias in conflict event data. American Journal of Political Science, Volume 60, Issue 1, pages 206-218.

Wen, L., et al. (2021) Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark, 2021. arXiv. https://arxiv.org/abs/2105.02440v1

Wood, S. (2020) Next generation visitation models using social media to estimate recreation on public lands. Scientific Reports, Volume 10.

Yuan, Y.M., et al. (2013) Estimating crowd density in an RF-based dynamic environment. IEEE Sensors Journal, Volume 13, Issue 10, pages 3837-3845.

Zhang, Y., et al. (2021) Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. CVPR Open Access. https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Zhang_Single-Image_Crowd_Counting_CVPR_2016_paper.pdf

Twitter’s own statistics, see: https://developer.twitter.com/en/docs/tutorials/advanced-filtering-for-geo-data ↩
The use of geographical filtering for tweets is no longer usable, as few tweets have accurate locations. ↩
E.g. see https://later.com/blog/ultimate-guide-to-using-instagram-hashtags/ ↩
At this time, Twitter search responses included all user-level metadata, including hometown/location. Search results no longer include this metadata, see https://developer.x.com/en/docs/twitter-api/tweets/search/quick-start/recent-search. ↩
Informally, Precision is the preciseness of items captured/predicted – i.e. is it mostly about the event, while Recall is the proportion of (known) relevant items that have been captured/predicted. These two measures are in tension – any approach that increases Recall will decrease Precision, as it will typically bring more non-relevant items too. Other measures such as F1, Accuracy and Balanced Accuracy portray overall performance: e.g. F1 is the harmonic mean of Precision and Recall; Accuracy is the percentage of correct predictions; Balanced Accuracy is an adjustment for imbalanced dataset, where values > 0.5 are better than a random classifier. For classification, we focus mostly on Balanced Accuracy but refer to other measures where appropriate. ↩
In particular, the LLM classifier does not produce a confidence/posterior, which is required by the quantification methods. It is possible to ask the LLM for a single token and examine the logits of the tokens it considered generating (as used by Feng et al, (2004)) but this requires further code changes as well as empirical validation on learning-to-quantify datasets. ↩
https://www.farnboroughinternational.org/what-we-do/farnborough-airshow/ ↩
Interestingly this appears in the data as Edinburgh and City of Edinburgh. There is no such duplication for other cities such as Glasgow. ↩
https://github.com/haotian-liu/LLaVA ↩
https://dspy.ai/ ↩

Cookies on GOV.UK

Executive Summary

1.Introduction

1.1 Context and objectives

1.2 Research objectives and summary of approach

1.3 Objectives of the report

2. Approach to researching data sources

3. Prioritisation approach and criteria

3.1 Stage 1: Assessment of granularity & frequency, and access

Granularity and frequency

Data accessibility

3.2 Stage 2: Attendance and participation estimation feasibility

4. Modelling approaches

4.1 Regression

4.2 Sample weighting

4.3 Breadth phase outcomes

5. Mobile app data

5.1 Breadth phase overview

5.2 Identified modelling approaches

Regression models

Sample weighting

5.3 Experimentation

5.4 Huq case study results

5.4.1 Estimating visitation

5.4.2 Estimating footfall

5.4.3 Estimating catchment

6. Transport and activity data

6.1 Breadth phase overview

6.1.1 Identified modelling approaches

6.2 Experimentation

6.3 Strava case study results

6.3.1 Estimating parkrun attendance

7. Social Media data

7.1 Breadth phase overview

7.1.1 Identified modelling approaches

7.2 Experimentation

7.3 Pulsar case study results

7.3.1 Collecting and labelling event data

7.3.2. Comparing event attendance classifiers

7.3.3 Estimating event attendance

7.3.4 Analysis of estimated event attendance

7.2.5 Recommendations

8.Deployable Sensing data

8.1 Breadth phase overview

8.2 Identified modelling approaches

8.2.1 Modelling with visual data

8.2.2 Modelling with signal data

8.2 Deployable Sensing Date: Evidence synthesis key findings

9. Event and Space data

9.1 Breadth phase overview

9.2 Identified modelling approaches

9.3 Event and Space Data: Evidence synthesis key findings

Appendix 1: Terms and definitions

Appendix 2: References

Appendix 3: Evidence Synthesis References

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK