The Gambling Commission: Illegal Gambling website identification tool
A tool for identifying and tracking the activity associated with illegal gambling websites.
1. Summary
1 - Name
Illegal Gambling Website Identification Tool
2 - Description
This tool identifies illegal gambling websites through the automated Google search of a list of relevant search terms. Web traffic estimates are then obtained from a web intelligence platform called Similarweb. This provides insights in online illegal gambling activity, and can be used to target our disruption activities for a more efficient use of resources.
3 - Website URL
https://www.gamblingcommission.gov.uk/statistics-and-research/publication/unlicensed-gambling-using-data-to-identify-unlicensed-operators-and-estimate https://www.gamblingcommission.gov.uk/report/illegal-online-gambling-consumer-engagement-and-trends
4 - Contact email
ResearchStatistics@gamblingcommission.gov.uk
Tier 2 - Owner and Responsibility
1.1 - Organisation or department
The Gambling Commission
1.2 - Team
Data Innovation Hub
1.3 - Senior responsible owner
Tim Miller, Executive Director for Policy and Research
1.4 - Third party involvement
No
1.4.1 - Third party
N/A
1.4.2 - Companies House Number
N/A
1.4.3 - Third party role
N/A
1.4.4 - Procurement procedure type
N/A
1.4.5 - Third party data access terms
N/A
Tier 2 - Description and Rationale
2.1 - Detailed description
This tool automates the identification of illegal gambling websites and uses web traffic estimations from Similarweb to monitor activity associated with them.
A Google search API is used to obtain results to a list of search terms, designed using findings from our consumer research into online illegal gambling motivations. Relevant links are extracted from the search results and estimations for number of visits and average visit duration are obtained from Similarweb, using an API, for each website on a monthly basis. These metrics are combined to create one metric that represents an estimation for total time spent on each site. Bootstrapping is performed to better represent the error associated with web traffic estimations. This data is delivered through a Power BI dashboard. Our operational teams use this dashboard to take a targeted approach to their disruption activities, prioritising the websites with the greatest online activity.
2.2 - Benefits
- Provide a better understanding of the trends in online illegal gambling over time
- Enable targeted disruption of online illegal gambling by prioritising illegal gambling websites with the greatest estimated web traffic
- Automation of data collection that was previously done manually, freeing up resources to be spent on disruption
2.3 - Previous process
Prior to the development of this tool, illegal gambling websites were identified through desktop research and web traffic estimates would be obtained manually, one by one.
2.4 - Alternatives considered
The alternative to this tool was for illegal gambling websites to be identified manually through desk research and for the web traffic data to be obtained manually, one at a time from Similarweb, instead of using the API. Automating the process with this tool has led to a more efficient use of resources so more time can be spent on disrupting the identified illegal gambling websites.
Tier 2 - Deployment Context
3.1 - Integration into broader operational process
The output of this tool is a Microsoft Power BI report which allows our operational team to view all identified illegal gambling websites along with the estimated web traffic associated with each site. This is used by the operational team to prioritise disruption activities, to ensure we are taking action against the most prolific illegal gambling websites.
3.2 - Human review
The Power BI report is updated monthly and users access it via our internal Microsoft Power BI Service, through their a browser
3.3 - Frequency and scale of usage
The Power BI report is updated each month to help decide which illegal gambling sites to direct our resources at disrupting. Currently, and average of 100 sites are disrupted each month.
3.4 - Required training
The Power Bi report is designed to be as simple and user friendly as possible. Training on how to use the report is provided when needed. However, since Power BI is used extensively within our organisation, minimal training is needed to use the tool.
3.5 - Appeals and review
N/A
Tier 2 - Tool Specification
4.1.1 - System architecture
A Python script is run each month and executes the following: 1. Data ingestion - Google search API used to perform automated search of search terms. Illegal gambling affiliate pages are identified from the results and links to illegal gambling websites are extracted.
-
Web traffic analysis - Web traffic estimates are obtained using an API for the web intelligence platform, Similarweb. Figures are aggregated to obtain an estimation for engagement (time spent) on the identified illegal gambling websites.
-
Storage and reporting - resulting data is written to parquet files and delivered through a Microsoft Power BI report that is accessed by our operational teams.
4.1.2 - System-level input
A list of search terms based on illegal gambling motivations.
4.1.3 - System-level output
Structured, tabular data for monthly web traffic estimations of identified illegal gambling websites, delivered through a Power BI report.
4.1.4 - Maintenance
Ongoing research informs the list of search terms used. Additional search terms are added as new consumer motivations are identified.
4.1.5 - Models
N/A
Tier 2 - Operational Data Specification
4.4.1 - Data sources
- List of search terms based on illegal gambling motivations
- Search results to those terms
- List of extracted illegal gambling website URL
- Web traffic data from web intelligence platform, Similarweb
4.4.2 - Sensitive attributes
No sensitive data
4.4.3 - Data processing methods
N/A
4.4.4 - Data access and storage
The resulting web traffic data set is stored on our organisations internal SharePoint and contains no sensitive data.
4.4.5 - Data sharing agreements
N/A
Tier 2 - Risks, Mitigations and Impact Assessments
5.1 - Impact assessments
N/A
5.2 - Risks and mitigations
False Positives: A legal, licensed site could be flagged as illegal.
Mitigation: 1. Regularly updating the licensed website database so that approved/legal sites are always recognised and not mislabelled 2. Human review by Operational team members before any report, enforcement, or takedown
False Negatives: The tool might miss flagging illegal websites
Mitigation: 1. Regularly update keyword lists 2. Continually update and perfect the methodology by monitoring trends and expert input