Guidance

Guidance on Crowd Capture to Extract Location Data from Archived Material

Published 12 November 2021

Acknowledgements

We would like to thank everyone involved in the project for their time and contributions. A special thank you goes to the organisations and individuals who took part in the interviews providing their advice and experience on crowd capture projects.

Glossary of Terms

The following terms are provided for your reference throughout the report.

Archive: In this report an archive is considered in the widest sense, so includes any collection of data and / or information held in either physical form, for example as paper, or in a digital format, where the latter may simply be a digital image of the physical object.

Crowd Capture: Crowd capture is the engagement of groups of people to ‘capture’ data from, for example, scanned materials such as maps, PDF documents, surveys, historic logs, books or photographs.

Crowdsourcing: Crowdsourcing is the process of engaging a wide group of volunteers to undertake a specific task on a project.

Digitise (or Vectorise): The process of converting information into the digital codes stored and processed by computers. In geographic applications, digitizing usually means tracing map features into a computer using a digitizing tablet, graphics tablet, mouse, or keyboard cursor. Source: OGC

Extracting: The extraction of values and/or information that informs understanding. An example could be extraction of map features, annotation and associated text or other information that describe a feature on a map. Alternatively, it could be the extraction of tabular data in a report that relates to a location. These processes may be undertaken manually or automatically.

Location: Location implies a geographic location to a known coordinate or place on the planet with which the archive material can be associated.

Scanning: Scanning describes the process by which a material such as a map is scanned and a digital/electronic copy created.

For a full geospatial glossary on any of the terms, names or phrases listed in this report please see the Geospatial Commission glossary.

1. Introduction

Purpose

This document provides guidance on crowd capture projects in relation to extracting location data from archive materials such as paper maps, reports and other old data files and formats converted to electronic images or PDFs .

This guidance specifically relates to the set-up and delivery of crowd capture projects that are run online. meaning that both the scanned data and the crowd capture activity are made available to the ‘crowd’ to undertake and deliver through an online platform. It addresses questions related to project set up and cost including time/staff resource requirements, narrative building and engaging a crowd.

This guidance does not refer to working with a crowd of volunteers in person or in the field. Also, this guidance note specifically refers to the capture of data from archive materials and does not refer to the generation or capture of entirely new location data.

Aim

This document provides insight into the experience and lessons learned of those who have delivered archive data capture projects using crowd capture techniques. In this way, those planning a crowd capture project can benefit from the experience gained on previous projects rather than ‘starting from scratch’. The document captures and shares the challenges, recommendations and experiences from completed and ongoing crowd capture projects.

Audience

Our intended audiences are organisations such as the Geospatial Commission, Geo6+ and Devolved Administrations that may be considering undertaking a crowd capture project as a way to extract data from archive materials. Some of this work may be equally applicable to private sector organisations wishing to start a crowdsourced project.

The primary focus in this work is geospatial data held by the Geo6 partner bodies: the British Geological Survey, Coal Authority, HM Land Registry, Ordnance Survey, UK Hydrographic Office and Valuation Office Agency. However, the guidance may be applicable to anyone considering undertaking a crowd capture project to extract location data from archive material.

Background

The Geospatial Commission (GC) was established to improve the quality of key, publicly held geospatial data and to make it easier to find, access and use. This work was carried out as part of the Geospatial Commission’s sponsored Data Improvement Programme that aimed to Improve access to better public sector geospatial data (Mission 2).

The first phase of the Archive Data Capture project identified crowd capture/sourcing as a potential methodology with which to capture heritage/unique location data held in archive (and other) digitised or scanned materials such as surveys stored as paper documents, geographic books and analogue photographs.

The results from that project highlighted that the capture of location data from archive materials is often a unique, complex and resource intensive undertaking and crowd capture is a potential means of capturing location data from scanned archive materials within these constraints. In addition, the project identified that:

  • Crowd capture and sourcing of location data is less well documented than the crowd capture and sourcing of non-location data
  • When undertaking location data capture projects agencies frequently act independently and ‘start from scratch’ using different tools, processes and platforms.

The digitisation of archives would represent the opening up of major national data assets that could support many use cases (e.g. climate change monitoring, digital conveyancing) and enrich existing digital datasets (e.g. complete digital Coal Authority records and British Geological Survey borehole data).

The guidance developed under the second phase focuses on the crowd capture of location data from materials that have previously been digitised/scanned according to the relevant standards. For guidance on digitising archive materials, please refer to the Geospatial Commission “Best practice for extracting data from archives” document.

Structure of Report

This guidance in this report is structured principally over two chapters as follows:

  • Chapter 2 provides a synopsis of the key recommendations for setting up crowd source projects grouped according the phase of the project
  • Chapter 3 provides detail and analysis supporting each of the recommendations along with how the content for the report was obtained.

In addition;

  • Appendix A details current crowd source projects that were consulted for this work
  • Appendix B provides a pointer to the key literature related to the project.

2. Recommendations for Crowd Source Projects

This chapter summarises the key considerations relating to the set-up, running and delivery of a project using crowd capture to extract data from scanned archive materials, as identified through our research and interviews. The recommendations are grouped according to the project phases; set-up, delivery and post-project. Details for each of these items is covered in Chapter 3.

While the considerations are wide ranging and extensive, they will not be fully comprehensive and are intended as a starting point to help agencies and individuals decide whether the use of crowd capture techniques is appropriate for them.

Project Set Up

  • Title & Narrative: When setting up a project to extract location data from archive materials, put thought and consideration into a clear and engaging project title that will help to generate the attention of interested volunteers. a clear and concise narrative detailing the history of the data and the impact extracted data will have will help attract a “crowd”

  • Resources: Set aside dedicated resources to set up a crowd capture project in terms of staff time on project scoping, project set up, and on the development, maintenance or redevelopment of proprietary software required to run a project. The cost of these required resources should be carefully assessed and managed before starting and during a project.

  • IPR related to Input and Output Data:Check the IPR of input data and scanned data, and understand if and how it can be used in a crowd capture project. Check the IPR of the output data and acknowledge the platform and volunteers used to help create it appropriately.

  • Closed Projects: Consider if a crowd capture project is the right approach for your data and organisation. Open by default is the recommended approach for crowdsource projects as this will ensure maximum return on investment, however for sensitive data it may be prudent to consider other methods of data collection and/or anonymization.

  • Gamification: When designing a crowd capture project, consider how a simple gamified task, or aspects of competition can be used to help engage a crowd.

  • Quality Assurance & Quality Control (QA/QC): Consider how many volunteers should extract each piece of data, what method will be used to help verify extracted data, how to manage errors in the original dataset and what data post processing will be required for extracted data.

  • Platforms: Consider the advantages and disadvantages of different platforms, for different types of data and tasks, and the associated costs involved in the project set up when using proprietary systems.

  • Crowd Capture & Automated Techniques: Consider using crowd capture to extract data from datasets that other techniques may struggle with, and as a means to quality control the results of automated extraction techniques.

Project Delivery

  • Project Promotion, Advertisement & Achieving a Crowd: Use of an existing “crowd” and/or targeted project promotion aimed at an audience with an interest or specialism in the field of research related to the crowd capture project will increase engagement.

  • Batched Tasks: When working with large datasets on a Crowd Capture project, release material to the Crowd in smaller prioritised batches. Continuous management of what to release and when, based on user activity and interest will be key to the success of a crowdsource project.

  • Archive Preparation: Quality scans are crucial to the success of a crowd capture project. refer to Extracting Data from Archives: Best Practice Guide for support on scanning and digitising archive materials in preparation for an extraction project.

  • Volunteer Engagement & Project Feedback: Regular and responsive engagement with volunteers throughout the project, in addition to feedback on project progress is important. Adequate resources should be planned and allocated to ensure this happens.

  • Volunteer Appreciation and Acknowledgement: Consider ways to show appreciation to volunteers donating time to work on crowd capture projects. Check the IPR of input data and scanned data, and understand if and how it can be used in a crowd capture project. Check the IPR of the output data and acknowledge the platform and volunteers used to help create it appropriately.

Post Project

  • Post Project Data Processing: A crowd capture project may not produce a perfect dataset which can be immediately utilised and shared. Some data processing on the extracted data may be required and skilled individuals (data managers, location data specialists) may be required to undertake this work.

  • Reuse of Data: Data extracted as part of a crowd capture exercise can be reused to support the extraction of datasets in other, separate projects. Consideration should be given to the licencing terms (OGL, CCL etc.).

  • Wider Agency and Public Engagement: A crowd capture exercise provides a way for agencies and individuals to interact with your project and organisation in new ways which can lead to additional benefits such as longer term engagement, membership and new partnerships and collaborations.

3.Detailed Findings for Crowd Source Projects

Between February and April 2021 the project team identified, approached and interviewed individuals who had or are currently running crowd capture projects with the specific purpose of extracting location data from archive materials. These projects originated from the GEO6 organisations and other public agencies and Universities. The names of the project, associated organisation, platform, type of archive data, aim, length of project and anticipated result of each project is detailed in Appendix A.

Title and Narrative

Finding: A catchy and engaging title helps to advertise and draw attention to a project designed to extract data from archive materials. Our research identifies that two projects, ‘Rainfall Rescue’ and ‘Don’t lose your way’ had names designed to help generate a sense of urgency and a call to action. Additionally, the narrative behind a crowd capture project is essential to help generate interest and engage potential volunteers to spend their time and work on a project. A clear and engaging narrative which shares the history of the data, and what the potential impact of the extracted digital data will be, is key to engaging volunteers on projects. Our research indicates that many people have demonstrated a curiosity and interest in history and ‘old maps’ and this leads to an enthusiasm to engage on projects to capture data from archive materials that have a clear narrative about how extracted data will be used.

What this means: When setting up a project to extract location data from archive materials, put thought and consideration into a clear and engaging project title that will help to generate the attention of interested volunteers. In addition, a clear and concise narrative detailing the history of the data and the impact extracted data will have, may help generate interest and act as a hook to help generate a ‘crowd’ to work on a project.

Resources

Finding: Although often viewed as ‘free’ because crowd capture projects involve work undertaken with the time and input from unpaid volunteers, setting up, and delivering, a crowd capture project is time intensive with associated costs. This point is not in relation to the scanning or preparation of archive materials to use on a project, but in the set-up, advertisement and testing of a project before it goes live (upfront cost). Our interviewees had staff working either full time or nearly full time on set up and this time ranged from weeks to months.

Additional costs were associated with the software platforms used to deliver the crowd capture project. When using free, open source software, such as The Zooniverse, time was required by project staff to understand how to use and set up a project on the platform. On projects which used bespoke, proprietary software costs were incurred when engaging with those developing the platform, on the development of the platform and/or on maintaining and redeveloping existing software.

What this means: Resources are required to set up a crowd capture project in terms of staff time on project scoping and project set up, and on the development, maintenance or redevelopment of proprietary software required to run a project. The cost of these required resources should be carefully assessed and managed before starting and during a project.

IPR related to Input and Output Data

Finding: The Intellectual Property Rights (IPR) include the terms of use attached to a data set. It is important to check the IPR of the archive dataset that is being used as part of the project and of the digital copy of the data used in a crowdsourcing project. It may be necessary to agree the use of a dataset, and/or a scan with the original owners. In addition the IPR of the output data should be clearly understood and communicated, with the platform and volunteers acknowledged appropriately.

What this means: Check the IPR of input data and scanned data, and understand if and how it can be used in a crowd capture project. Check the IPR of the output data and acknowledge the platform and volunteers used to help create it appropriately.

Closed Projects

Finding: While many Crowd Capture projects are made publically available and widely publicised to help draw a crowd, crowd capture projects can also be set up as closed projects. This means that only those specifically invited to join a project can see the data and participate in the crowd capture exercise. This option may work for projects working with sensitive data, or for organisations during a time when staff cannot undertake their normal work (such as during a pandemic), or simply for testing a project before it goes live. The obvious downside is that your “crowd” is a lot smaller than a public project and therefore data will take longer to capture.

What this means: Consider if a ‘closed’ crowd capture project is the right approach for your data and organisation. Open by default is the recommended approach for crowdsource projects as this will ensure maximum return on investment, however for sensitive data it may be prudent to consider other methods of data collection.

Opportunity

Finding: Interruptions to planned project work, for example during a pandemic, have provided opportunities for crowd capture work. Firstly staff who cannot undertake normal work can be diverted to set up, run and deliver a crowd capture project. Secondly, volunteers who can not undertake their normal work can volunteer and undertake crowd capture work.

What this means: If archive data has been prepared and is in a sharable format, consider what opportunities exist to run a crowd capture project, potentially at a time when other project work has been interrupted.

Gamification

Finding: Volunteers responded well to tasks that had been ‘Gamified’. For example simple tasks where volunteers were requested to identify differences between old and new maps and/or draw on a map were shown to be popular. In addition, volunteers responded well to instant feedback in the form of a completed area being greyed out, or a pin in a map changing colour to indicate the status of a task. Introducing competition, in the form of leader boards, was sometimes popular. One project reported that those at the top of the leader boards became quite competitive and this helped to drive engagement on the project and the delivery of more tasks. However it was highlighted that the competition aspect should be used carefully so as not to alienate less active volunteers.

What this means: When designing a crowd capture project, consider how a simple gamified task, or aspects of competition, can be used to help engage a crowd.

Quality Assurance & Quality Control (QA/QC)

Finding: A number of points came up in relation to Quality Assurance and Quality Control:

  • Decide how many volunteers should extract and agree on a piece of data before it is accepted as complete and accurate.
  • If a total is present in the original dataset, for example a column total in a table that can be used to validate extracted data, fewer volunteers may be required to look at each piece of data.
  • If volunteers disagree on an extracted piece of data, that data can be flagged for checking and put back into the crowd capture exercise.
  • Original errors in the original dataset will likely be picked up by volunteers.
  • The crowd capture exercise may introduce inconsistencies in the data that will need to be amended once the exercise is complete. For example if a volunteer has been given a square block, in which to digitise line features, line features will end at the box edge and need to be joined up as part of a follow up QA/QC exercise.

Overall those we interviewed were confident in the quality of the data collected as part of a crowd capture exercise. One project indicated that disagreements between volunteers on a piece of captured data was less than ½ a percent. This was attributed to the volunteer’s genuine interest in a field of research and desire to do the task correctly. One issue was raised when a volunteer entered random letters or zeros, but this was easily picked up with QA measures such as those highlighted above, and seemed to be a rare occurrence.

What this means: Consider how many volunteers should extract each piece of data, what method will be used to help verify extracted data, how to manage errors in the original dataset and what data post processing will be required for extracted data.

Platforms

Finding: Those we spoke to had used either an open source platform or a bespoke platform to undertake a crowd capture project. However, other crowd capture and citizen science platforms are available. There are a few free customisable, open source platforms available, some include extensive guidance. Most are straightforward to upload scanned images and set up a project including specifying the questions for a volunteer to answer and how many volunteers should answer each question. Identified drawbacks of open source software include the wide range of options and the time required to become familiar with a platform and all the possibilities. The size (Mb) of images that may be used could also be limited Bespoke platforms for crowd capture, have the ability to create software that is specifically tailored to your task and crowd capture exercise. However, bespoke platforms come with a cost in terms of development and maintenance. Maintaining bespoke software was identified as a particular challenge when funds were limited. In general, from the pool of projects we spoke to, researchers extracting data from a scan of a table of data (tied to a geographic location) used open source software. Those extracting or digitising geographic data (points, lines, polygons or annotations) from charts or maps used bespoke software to enable them to work with these datasets and set up a workflow and task for volunteers.

What this means: Consider the advantages and disadvantages of different platforms, for different types of data and tasks, and the associated costs involved in the project set up when using proprietary systems.

Crowd Capture & Automated Techniques

Finding: When considering undertaking a crowd capture project many organisations had considered whether automated techniques, such as artificial intelligence and machine learning could have been used to extract the data from the archive materials. Crowd capture was found to be a good option when working with data that automated techniques would struggle with, for example handwritten borehole logs. Additionally, Crowd capture was highlighted as a potential process through which to check data created as a result of automated data extraction techniques.

What this means: Consider using crowd capture to extract data from datasets that other techniques may struggle and as a means to quality control the results of automated extraction techniques.

Project Promotion, Advertisement & Achieving a Crowd

Finding: Achieving a ‘crowd’ of interested volunteers willing to spend their own time undertaking data extraction work, is key to the success of a crowd capture project.

Citizen Science platforms have a ‘ready-made’ crowd. Some examples have over 2 million signed up to a platform. Within this crowd individuals have a range of interests and some may join an archive data project, others not. Whatever platform is considered a key component of generating a crowd to undertake your work is to use as many means as possible to reach out to willing and interested volunteers.

A wide range of outlets have been and can be used to advertise and promote a project to potential volunteers. Media coverage with a good story about the data, which includes ‘why’ it is important to extract the data and ‘how’ it will be used, targeted at those with an interest in the field, will help to build a crowd of interested volunteers. One excellent example of successful promotion is the advertisement of the GB1900 project in the ‘who do you think you are’ magazines ‘Transcription Tuesday’ event.

What this means: Use of an existing crowd and/or targeted advertisement and project promotion aimed at an audience with an interest or specialism in the field of research related to the crowd capture project will increase engagement.

Batched Tasks

Finding: Crowd Capture projects frequently involve asking volunteers to extract many 1000’s of data points across geographic space and/or time. Projects rely on the continued involvement of volunteers to extract data from material until a project is complete. Releasing defined useful groupings of material from a project can help ensure that useful data are generated. For example, releasing a batch of data for a specific geographic area, that when complete could be considered ‘useful’, will help to ensure that usable data results from a project even if the full archive set is not complete.

What this means: When working with large datasets on a Crowd Capture project, release material to the Crowd in smaller prioritised batches. Continuous management of what to release and when, based on user activity and interest, will be key to the success of a crowdsource project.

Archive Preparation

Finding: Digital scans of a hard copy resource are required to be of sufficient quality to help ensure the success of a crowd capture project. A poor digital scan can render it difficult for volunteers to work with. For example if a scan of a table is skewed, software may incorrectly indicate the column or row from which a volunteer should transcribe data. Alternatively, if a scan is blurry or if a coloured original has been scanned in black and white, volunteers may not be able to read the required data or read the data correctly. Although a scan may appear straightforward to extract data from, issues such as those described above can render it difficult for volunteers to transcribe from and work with.

What this means: Quality management of scans is crucial to the success of a crowd capture project. Users are referred to Extracting Data from Archives: Best Practice Guide for support on scanning and digitising archive materials in preparation for an extraction project.

Volunteer Engagement & Project Feedback

Finding: Our research consistently highlighted the importance of volunteer engagement throughout the lifecycle of a project. Regular and responsive feedback was important to maintain crowd engagement and therefore retain volunteers. The majority of projects didn’t ‘run themselves’ and there were regular enquiries from the volunteers. Enquiries ranged from questions about the original data, the data itself, errors in the archive materials and data reuse to questions related to the task at hand.

Despite its highlighted importance, the amount of volunteer engagement varied between projects. Some projects were surprised that after an initial flurry, there were then fewer enquiries than expected and that the nature of the enquiries focused more on the data rather than the task. Others highlighted that they had underestimated how much work it would take to run a crowd capture project and to provide communications to users and answer questions and issues as they came up.

In addition to engagement, providing feedback and updates on the progress of the project was important to maintain a crowd’s enthusiasm and interest in a project. Project progress has been demonstrated in a number of ways, for example, by greying an area out or changing the colour of a pin on a map when data collection is complete, by providing a simple graph demonstrated how much data has been extracted so far, or a snapshot of how newly extracted data is being used to meet the original aims of the project. Engagement with volunteers and the provision of project feedback requires an allocated, and responsive staff resource.

What this means: Regular and responsive engagement with volunteers throughout the project, in addition to feedback on project progress is important. Adequate resources should be planned and allocated to ensure this happens.

Volunteer Appreciation and Acknowledgement

Finding: Project teams used different ways to demonstrate appreciation and acknowledge the time and efforts of the Volunteers. Initiatives ranged from emails sent from the Director of an organisation, to free t-shirts, to simple thank yous on project chat boards.

What this means: Consider ways to show appreciation to volunteers donating time to work on crowd capture projects.

Post Project Data Processing

Finding: Those we interviewed were satisfied with the outputs of crowd capture projects. However the importance and need for post project data processing was highlighted. For example the outputs received from some platforms were described as ‘very raw’. This means that skilled individuals, such as data managers and data specialists, were required to handle the output received before it could be used more widely. Extracted and digitised location data needed to be checked, for example in case a volunteer has digitised an incorrect feature, and possibly joined together and cleaned before further use and sharing.

What this means: A crowd capture project may not produce a perfect dataset which can be immediately utilised and shared. Some data processing on the extracted data may be required and skilled individuals (data managers, location data specialists) may be required to undertake this work.

Reuse of Data

Finding: Datasets generated as part of a crowd capture exercise may be reused in later crowd capture projects. For example, the GB1900 Dataset was used in the recent Ramblers Association ‘Don’t lose your way’ project.

What this means: Data extracted as part of a crowd capture exercise can be reused to support the extraction of datasets in other, separate projects. Consideration should be given to the licencing terms (OGL, CCL etc.).

Wider Agency and Public Engagement

Finding: In some cases Crowd Capture projects achieved aims outwith the collection of the data. For example, Crowd Capture was seen as a good hook for wider organisational engagement with members of the public and other organisations and agencies. Although a Crowd Capture exercise may have evolved to solve a specific issue around data capture, the exercise frequently demonstrated that they were good ‘hooks’ to a wider audience. For example projects promoted a greater awareness of an organisation and provided a means to maintain people’s involvement in the future, for example through membership or further volunteering opportunities.

Additional benefits included a greater public understanding of science, improved contacts with the community and the development of new partnerships and collaborations. In essence a crowd capture project provided a route to engage with a new audience and a way to gain the involvement of people over the longer term.

What this means:A crowd capture exercise provides a way for agencies and individuals to interact with your project and organisation in new ways which can lead to additional benefits such as longer term engagement, membership and new partnerships and collaborations.

Appendix A - Archive Data Crowd Capture Projects

Between February and April 2021 the project team identified, approached and interviewed individuals who had or are currently running crowd capture projects with the specific purpose of extracting location data from archive materials. These projects originated from the GEO6 organisations and other public agencies and Universities. The names of the project, associated organisation, platform, type of archive data, aim, length of project and anticipated result of each project is detailed in table 1.

The interviews covered all aspects of running a crowd capture project to extract location data from archive materials. The questions covered a range of topics including the project background, project aims, types of archive materials used, the platforms used, the ease of setting up a project, how to generate a crowd of volunteers, how to engage with volunteers and the project, or expected project results. The key considerations, learnings and recommendations resulting from the interviews are detailed in the following sections of this guidance note.

Project name Agency Platform Archive Data Aim Project Length Result
Rainfall Rescue University of Reading Zooniverse Scans of tabulated data For volunteers to extract data points from historic rainfall records A few weeks 5.28 million measurements were transcribed by volunteers from 66,000 pages of scanned archive material
Don’t lose your Way Ramblers association Proprietary Web mapping service, showing scans of historic Ordnance Survey maps For volunteers to digitise walking paths in England and Wales A few weeks A vector dataset of walking paths
UK Tides NOC/ BODC Zooniverse Scanned tide gauge ledgers For volunteers to extract data from historic tide gauge ledgers from the North West of England Months Ongoing
Big Borehole Dig British Geological Survey Proprietary 1.4 million borehole scans Volunteers digitise handwritten borehole records into industry standard file formats such as AGS. Ongoing - years Ongoing
GB1900 Portsmouth University Proprietary Web mapping service, showing scans of historic maps, from the national Library of Scotland Crowd-sourcing project to transcribe all text strings from the second edition six inch to the mile County Series maps published 1888-1914 Two years 2.55m. geo-located text strings

Table 1: Crowd capture projects explored under the project

Appendix B - Summary of Relevant Literature

Extensive literature exists on the subjects of citizen science and crowdsourcing. Fewer papers exist on the subject of the crowd capture of data from materials such as archive data. This short literature review does not aim to cover the broader literature on citizen science and crowdsourcing. This pointer to relevant literature aims to share the range of papers that we found in relation to crowd capture techniques for extracting data and location data from archive materials. Other papers may exist which were not identified as part of our research.

(Southall, Aucott, Fleet, Pert & Stoner, 2017) Describe the background to the GB1900 project and how “this has been achieved through large-scale crowd-sourcing, using Zooniverse-based software. The article describes the project’s history, the software system and the transcription process” Additionally, “the article describes publicity methods, volunteer characteristics and motivation: it is argued that while “citizen science” projects appeal to a general desire to advance knowledge, map-based projects can appeal to more locally-focused individual interests: finding meaning in maps and places.” This conclusion tally’s with our research that crowd capture projects, involving historic location data, are popular and can draw a crowd of interested volunteers.

(Aucott, Southall, 2019) details ”the final datasets, and how they were created” in The GB1900 project which “used crowd-sourcing to transcribe all text from the second edition County Series six inch to one mile maps of Great Britain, published between 1888 and 1914, a total of c. 2.55m. geo-located text strings.” This paper highlights how crowd capture projects can be used to produce significant location datasets.

An excellent study on volunteer motivation is included in (Aucott P., Southall H., Ekinsmyth C. 2019) “This paper describes the project’s interaction with online volunteers and then presents their experience, as recorded through the online system itself, six in-depth interviews and 162 responses to an online questionnaire. We find that, unlike volunteers in physical science “citizen science” projects, they were motivated by personal interest in the maps, in places that held meaning for them, and in how places had changed. These conclusions enable us to offer suggestions for volunteer recruitment and retention in similar future projects” Recommendations to engage volunteers on similar projects include “the tasks asked of volunteers … should always be satisfying even in short initial sessions”, ““gamification” works” but should be applied with care so as to not leave some volunteers behind, “volunteers … benefited more directly through an engagement with particular places which had meaning for them”, “communication with (the contributors) was very important” and “it (was) important to, not only develop … communication channels, but also develop continuing relationships with our contributors. “This requires real commitments on both sides”. This paper again shines a light on the popularity of crowd capture projects using historic maps, and on the importance of volunteer engagement throughout the duration of a crowd capture project.

In a news article for the Guardian, Belknap G (2016), discusses the use of citizen science in the humanities and the way it is “opening up the vast archives of history to the public”. Belknap highlights that the opening up of archives leads to “questions as to who gets to participate in researching history, and what it means to be an experts”. This paper highlights how crowd capture can be used to increase public understanding of science. The paper by Mika, K & Veer, J & Rinaldo, C. (2017) “surveys the landscape of current, successful, and innovative crowdsourcing platforms for obtaining full text transcriptions and structured datasets hidden in manuscript items in the Biodiversity Heritage Library”. This paper contains a useful summary of available platforms, and some of the advantages and disadvantages of these platforms when used in crowd capture projects.

While this pointer to relevant literature is short, it is intended as a starting point should an individual or organisation wish to explore any of the topics in greater depth prior to starting a crowd capture exercise involving location data.

Appendix C: References

Ali, A.L., Schmid, F., (2014), “Data quality assurance for volunteered geographic information”, International Conference on Geographic Information Science, pp. 126–141. Springer, Cham. doi:10.1107/S205327331303091X

Aucott, P. and H. Southall. “Locating Past Places in Britain: Creating and Evaluating the GB1900 Gazetteer.” Int. J. Humanit. Arts Comput. 13 (2019): 69-94

Aucott P., Southall H., Ekinsmyth C., (2019) “Citizen science through old maps: Volunteer motivations in the GB1900 gazetteer-building project.” Historical Methods: A Journal of Quantitative and Interdisciplinary History 52 (2019): 150 - 163

Basiri A., Haklay M., Foody G. & Mooney P. (2019) “Crowdsourced geospatial data quality: challenges and future directions” International Journal of Geographical Information Science, 33:8, 1588-1593, DOI: 10.1080/13658816.2019.1593422

Geoffrey B. (2016), “People power: how citizen science could change historical research” The Guardian: https://www.theguardian.com/science/the-h-word/2016/apr/26/how-citizen-science-could-change-historical-research-crowdsourcing

Kearney N., Wallis E., (2015) “Transcribing between the lines: crowd-sourcing historic data collection.” MWA2015: Museums and the Web Asia 2015. https://mwa2015.museumsandtheweb.com/paper/transcribing-between-the-lines-crowd-sourcing-historic-data-collection/

Mika K., Veer J., Rinaldo C. (2017) “Crowdsourcing Natural History Archives: Tools for Extracting Transcriptions and Data. Biodiversity Informatics” 12. 10.17161/bi.v12i0.6646

Southall H., Aucott P., Fleet C., Pert T. & Stoner M. (2017), “GB1900: Engaging the Public in Very Large Scale Gazetteer Construction from the Ordnance Survey “County Series” 1:10,560 Mapping of Great Britain”, Journal of Map & Geography Libraries, 13:1, 7-28