Guidance

Record key information about Essential Shared Data Assets

Updated 26 April 2024

Using metadata to describe Essential Shared Data Assets (ESDAs) making it easier to catalogue, validate, reuse and share your data.

In this context and for the purposes of specific ESDA guidance, a data asset is a container that holds one or more datasets. Datasets are individual, structured files containing data that are organised within data assets.

If you are responsible for data assets that are shared with another public sector organisation to deliver an essential purpose or process (i.e. Essential Shared Data Assets), you should include information about your data using an agreed metadata standard

This information is called metadata and the records that contain this information may be referred to as attributes or meta(data) elements. By doing this, you will:

  • make your data searchable and easier for users to find it
  • make it easier for the data to be catalogued and validated
  • ensure your data is accessible and reusable - your data is often reused even when you do not expect it to be

You should use the Data Catalogue Vocabulary (DCAT) to describe the metadata for your ESDAs if you create, maintain and share datasets with other organisations. 

A dataset is defined in DCAT as: “A collection of data, published or curated by a single agent, and available for access or download in one or more representations. 

This guidance also applies to distributions which are defined in DCAT as: A specific representation of a dataset. A dataset might be available in multiple serialisations that may differ in various ways, including natural language, media-type or format, schematic organisation, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above). 

If the data asset you create will be published in the form of spreadsheets, CSV files or other data in tabular form, you should also refer to the guide on publishing your tabular data, if you’re making your data open. All CSV files should comply with the Tabular data standard.

Who should use this guidance

This guidance is part of a collection on open standards to assist those already working with metadata and you should adopt it when submitting your ESDAs for inclusion in the Government Data Marketplace.

You must follow this guidance if the data assets being shared have been designated as Essential Shared Data Assets (ESDAs), which refer to data collected, used and maintained to deliver public services (and any other purposes or processes defined in the underlying guidance on ESDAs). As best practice, use this guidance if you create any other type of data assets, unless the use of other metadata standards is more appropriate

For example, if you are creating, maintaining or managing metadata for geospatial data (that which references data to a location on the surface of the Earth), you should instead use the GEMINI metadata for spatial datasets, including those covered by the INSPIRE regulations. You can also refer to the open standards profiles on ‘Exchange of location point’ and ‘Identifying property and street information’ for more details.

Using metadata in government

By following this guidance, you will use a consistent metadata vocabulary to describe Essential Shared Data Assets and improve data interoperability across government. This guidance is related to other Open Standards such as schema.org and Dublin Core, both recommended for government use.

Where to record and store your ESDA metadata

When recording metadata, it’s important to store it linked to, or with, the underlying data it’s describing. You can do this by storing metadata:

  • within the dataset itself, or
  • in a separate file, such as a readme file, and keep a record showing the link between data and metadata, and
  • in a Metadata Catalogue (if your organisation does not currently have one you should consider creating one to support your ESDA submissions)

When publishing your data, you will need to consider:

  • how the metadata may be harvested by data portals, such as the Government Data Marketplace, 
  • how easy it will be for users to discover the data assets you intend to publish, and
  • How it will enable both humans and machines to interpret the metadata, for example by avoiding the use of acronyms and domain jargon in the title

Read our guidance on ‘Publishing tabular data’ to understand more about how you publish metadata.

Making your ESDA metadata machine readable and accessible

To make metadata machine readable and accessible, you must format your metadata in a specific way.

When recording the metadata for your ESDAs, make sure you use plain English by making it specific, informative, clear and to the point and follow the writing for GOV.UK guide. For example, do not use jargon, and make sure you define technical terms and expand acronyms. Try to avoid using symbols that users or machines might misinterpret.

If you are providing usage guidelines within the metadata information and you include links to content stored elsewhere, i.e. by inserting URLs to ancillary documentation, web pages, etc - do make sure these can be accessible by users outside your organisation, otherwise remove them and provide that information alongside your dataset. 

It is important that all Mandatory attributes are completed when publishing Essential Shared Data Assets. For Recommended or Optional metadata attributes, if you do not have the information you need to record, you can still add the metadata, but add “unknown” or “not applicable” when relevant, in preference for ‘null values’.

Metadata you should record

When submitting your ESDAs, you should record all Mandatory information that will help others:

  • be informed on where and when your data was collected - use ‘creator’ and ‘dateCreated’ to record who created the data and the date they created it
  • find the data you’ve saved on a shared network, and identify whether it’s the data they need - use ‘title’, ‘description’ and ‘identifier’ to describe your data
  • state the version of the data you’ve collected - use ‘expires’ and ‘supersededBy’ so users know which version of your data to use
  • use  ‘temporalCoverage’ to indicate the time period to which your data applies, and ‘conformsTo’ to tell users whether your file applies to a specific standard or schema
  • use the data you’ve collected appropriately - ensure you have stated the ‘accessRights’ and ‘securityClassification’ to make sure users do not share sensitive data in ways it shouldn’t be, and also state the ‘license’ that applies to your data assets to help users understand their rights to use the data you’ve collected

Below are further examples of metadata and attributes which need to be included, note however this is not a comprehensive list. You should follow the specific guidance for Essential Shared Data Assets and use the UK Cross-Government Metadata Exchange Model applicable to these. In the Metadata Exchange Model, these attributes are referred to as ‘properties’ and each contains a definition and usage notes, as well as some specific examples.

Recording time and dates in your metadata

Using ‘Created’

You should record the date when you create a dataset to help users of the dataset know whether it is valid and relevant to them. You must record any dates using the ISO 8601 standard, which is an Open Standard selected for use by the government.

For example, created:“2002-10-02”

You must capture the exact time a dataset is collected when you’re collecting more than one version of a dataset a day. This means listing the date and time elements in descending order of size (years, months, days, hours, minutes, seconds, milliseconds and microseconds). You should provide the right level of accuracy for your dataset.

For example, if you publish your dataset once a year, it might be enough to provide a date down to the day, for example, 2020-07-14. If you publish multiple times a day, it is better to include information down to the second, for example, 2020-07-14T12:57:03Z. Note that in the ISO 8601 Date and time format standard, ‘Z’ specifically means UTC (often known as GMT in the UK). Make it clear when time is not in British Summer Time, even though the date is in July, such as in the above example, which indicates a time stamp for data published shortly before 2pm (BST) on 7th July 2020.

Record the provenance of your data

Using ‘creator’

You should record who created a dataset so users can communicate with the creator and understand if the data is relevant to them. For example, a data analyst may want to find out how reliable a dataset is before undertaking any analysis.

Record the name of the organisation derived from the list of values associated with this attribute, for example, “Cabinet Office”.

Help users find, use and identify your dataset

Using ‘title’

You must include the name of your dataset so users can find and identify the right dataset.

You should try to ensure the name captures information that will help users determine whether the dataset meets their needs. For example, by capturing the topic and specific information about place and geography.

For example, title:”Government Digital Services London Office staff building occupancy”. 

In order to keep titles short yet meaningful, you could describe further using ‘alternativeTitle’ or ‘description’ whether the dataset relates to “All London Offices” or a specific location, e.g. “Whitechapel” as per this example.  

Using ‘description’

You must provide a description so that there is a rich, human readable explanation of the data asset, in addition to the title, so that users of your data can find out if it’s relevant to them.

The descriptions of your data should only describe the type of data collected and should not include warnings about how to use the data - any warnings should be explained within usage notes or by reference to other properties such as ‘accessRights’.

Using ‘identifier’

You should uniquely identify your dataset so that users of your data know exactly which source they’re using.

You should identify your data asset by:

  • using the identification system your organisation is using (in cases where organisations have a system in place)
  • using an opaque identifier you’ve created - this should be random numbers rather than sequential or semi-sequential numbers to avoid meaning being implied

Using a meaningless identifier avoids the misunderstanding that comes with applying meaning to identifiers. For example, meaning can change over time. Meaningless identifiers can be genuinely constant things.

For example, identifier:“362857580”

You can ensure this meaningless identifier stays unique by keeping a catalogue of all datasets with their identifiers.

Using ‘mediaType’

In a distribution, you should record the file format or encoding method of the data asset being described, since many datasets are (or can be) published in multiple formats (mediaTypes). For example: 

  • CSV: text/csv
  • Excel (.xlsx): application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  • Geopackage: application/geopackage+sqlite3
  • HTML: text/html
  • PDF: application/pdf
  • Word (.docx): application/vnd.openxmlformats-officedocument.wordprocessingml.document

Use the media type which is most relevant to your dataset, for example so that browsers can decide how to present the underlying data. The media type used is derived from a list of values as defined by IANA IANA-MEDIA-TYPES.

Make sure your data is used appropriately

Using ‘licence’

For protected data such as personal, sensitive or commercial data, you should record information that will help users of the data understand its terms and conditions.

You may want to include the relevant data-sharing agreement, legal regulation or certification. This could be a memorandum of understanding (MOU) or Data Protection Impact Assessment.

NOTE: The open standards vocabularies for schema.org, Dublin Core and DCAT spell the noun ‘licence’ using the American spelling ‘license’. 

For example, licence:“Memorandum of Understanding between the Charity Commission for England and Wales and the Office for Students”

When publishing open data, you should label the data you’ve collected with its licence for use. In many cases within government, this will be the Open Government Licence (OGL). You should also link to the licence file to explain what the licence means and how others can use your code and content.

For example, licence: “https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/”

Using ‘accessRights’

You should record the sensitivity of your data so it’s not shared or published in ways it should not be.

You should provide information about who should be able to access the data you’ve collected, and any restrictions including:

  • whether it’s Open, Commercial or Internal
  • the handling caveat for the data
  • the security classification of data

For example, Internal for restricted access to data.

Additional Information

Refer to the full set of attributes in the UK Cross-Government Metadata Exchange Model which has more comprehensive information and usage notes of all metadata requirements that apply to ESDAs.