Recording metadata to describe critical data assets

Q: 5. Metadata you should record

When identifying your critical data assets , you should record all mandatory information that will help others: be informed on where and when your data was collected – use ‘creator’ and ‘dateCreated’ to record who created the data and the date they created it find the data you’ve saved on a shared network, and identify whether it’s the data they need – use ‘title’, ‘description’ and ‘identifier’ to describe your data state the version of the data you’ve collected – use ‘expires’ and ‘supersededBy’ so users know which version of your data to use use ‘temporalCoverage’ to indicate the time period to which your data applies, and ‘conformsTo’ to tell users whether your file applies to a specific standard or schema use the data you’ve collected appropriately – ensure you’ve stated the ‘accessRights’ and ‘securityClassification’ to make sure users do not share sensitive data in ways it should not be, and also state the ‘licence’ that applies to your data assets to help users understand their rights to use the data you’ve collected The rest of this guidance has more detail and examples of metadata and attributes that you need to include in your data assets

Question 1

1. Who this guidance is for

Accepted Answer

This guidance is part of a collection on open standards to assist anyone in government already working with metadata. You should follow this guidance when identifying and recording critical data assets. Using metadata to describe government data assets will make it easier to catalogue, validate, reuse and share your data.

You can also use this guidance if you create any other type of data assets, unless using other metadata standards is more appropriate. For example, if you’re creating, maintaining or managing metadata for geospatial data (which references data to a location on the surface of the Earth), you should instead use the GEMINI metadata for spatial data sets, including those covered by the INSPIRE regulations. You can also refer to the open standards profiles on Exchange of location point and Identifying property and street information for more details.

Question 2

2. What critical data assets and metadata are

Accepted Answer

A data asset is a container that holds one or more data sets. Data sets are individual, structured files containing data that are organised within data assets.

Critical data assets are data assets that are shared with another public sector organisation to deliver an essential purpose or process. If you’re responsible for critical data assets, you should record information about this data. This information is called metadata and the records that contain this information may be referred to as attributes or meta(data) elements.

By doing this, you’ll:

make your data searchable and easier for users to find it
make it easier for the data to be catalogued and validated
ensure your data is accessible and reusable – your data is often reused even when you do not expect it to be

You should use the Data Catalogue Vocabulary (DCAT) to describe the metadata for your critical data assets if you create, maintain and share data sets with other organisations.

A data set is defined in DCAT as ‘a collection of data, published or curated by a single agent, and available for access or download in one or more representations’.

This guidance also applies to distributions which are defined in DCAT as ‘a specific representation of a data set. A data set might be available in multiple serialisations that may differ in various ways, including natural language, media type or format, schematic organisation, temporal and spatial resolution, level of detail or profiles (which might specify any or all of the above)’.

If the data asset you create will be published in the form of spreadsheets, CSV files or other data in tabular form, you should also refer to the guide on publishing your tabular data if you’re making your data open. Make sure all CSV files comply with the Tabular data standard.

Question 3

3. Where to record and store your metadata

Accepted Answer

When recording metadata, it’s important to store it linked to, or with, the underlying data it’s describing. You can do this by storing metadata either:

within the data set itself
in a separate file, such as a readme file, and keep a record showing the link between data and metadata

You should also store the metadata in a metadata catalogue. If your organisation does not currently have one you should consider creating one to support your metadata submissions.

When publishing your data, you will need to consider:

how the metadata may be harvested by data portals
how easy it will be for users to discover the data assets you intend to publish
how it will enable both humans and machines to interpret the metadata, for example by avoiding the use of acronyms and domain jargon in the title

Read the guidance on publishing tabular data to understand more about how you publish metadata.

Question 4

4. Making your metadata machine-readable and accessible

Accepted Answer

To make metadata machine-readable and accessible, you must format your metadata in a specific way.

When recording the metadata for your data assets, make sure you use plain English by making it specific, informative, clear and to the point, and follow the writing for GOV.UK guide. For example, do not use jargon, and make sure you define technical terms and expand acronyms. Try to avoid using symbols that users or machines might misinterpret.

If you’re providing usage guidelines within the metadata information and you include links to content stored elsewhere – for example, by inserting URLs to ancillary documentation or web pages – make sure these are accessible to users outside your organisation. If they’re not, remove them and provide that information alongside your data set.

It’s important that you complete all mandatory attributes are completed when publishing critical data assets. For recommended or optional metadata attributes, if you do not have the information you need to record, you can still add the metadata, but add ‘unknown’ or ‘not applicable’ when relevant, in preference for ‘null values’.

Question 5

5. Metadata you should record

Accepted Answer

When identifying your critical data assets, you should record all mandatory information that will help others:

be informed on where and when your data was collected – use ‘creator’ and ‘dateCreated’ to record who created the data and the date they created it
find the data you’ve saved on a shared network, and identify whether it’s the data they need – use ‘title’, ‘description’ and ‘identifier’ to describe your data
state the version of the data you’ve collected – use ‘expires’ and ‘supersededBy’ so users know which version of your data to use
use ‘temporalCoverage’ to indicate the time period to which your data applies, and ‘conformsTo’ to tell users whether your file applies to a specific standard or schema
use the data you’ve collected appropriately – ensure you’ve stated the ‘accessRights’ and ‘securityClassification’ to make sure users do not share sensitive data in ways it should not be, and also state the ‘licence’ that applies to your data assets to help users understand their rights to use the data you’ve collected

The rest of this guidance has more detail and examples of metadata and attributes that you need to include in your data assets

Question 6

6. Recording time and dates in your metadata

Accepted Answer

Using ‘created’

You should record the date when you create a data set to help users of the data set know whether it’s valid and relevant to them. You must record any dates using the ISO 8601 standard, which is an Open Standard selected for use by the government.

For example, created:2002-10-02.

You must capture the exact time a data set is collected when you’re collecting more than one version of a data set a day. This means listing the date and time elements in descending order of size (years, months, days, hours, minutes, seconds, milliseconds and microseconds). You should provide the right level of accuracy for your data set.

For example, if you publish your data set once a year, it might be enough to provide a date down to the day, for example, 2020-07-14. If you publish multiple times a day, it’s better to include information down to the second, for example, 2020-07-14T12:57:03Z. Note that in the ISO 8601 Date and time format standard, ‘Z’ specifically means UTC (often known as GMT in the UK). Make it clear when time is not in British Summer Time, even though the date is in July, such as in the above example, which indicates a time stamp for data published shortly before 2pm (BST) on 7th July 2020.

Question 7

7. Record the provenance of your data

Accepted Answer

Using ‘creator’

You should record who created a data set so users can communicate with the creator and understand if the data is relevant to them. For example, a data analyst may want to find out how reliable a data set is before undertaking any analysis.

Record the name of the organisation derived from the list of values associated with this attribute, for example, ‘Cabinet Office’.

Question 8

8. Help users find, use and identify your data set

Accepted Answer

Using ‘title’

You must include the name of your data set so users can find and identify the right data set.

You should try to ensure the name captures information that will help users determine whether the data set meets their needs. For example, by capturing the topic and specific information about place and geography.

For example, title:Government Digital Services London Office staff building occupancy.

In order to keep titles short yet meaningful, you could describe further using ‘alternativeTitle’ or ‘description’. For example, you might state whether the data set relates to ‘All London Offices’ or a specific location such as ‘Whitechapel’.

Using ‘description’

You must provide a description so that there’s a rich, human-readable explanation of the data asset, in addition to the title, to help users of your data understand if it’s relevant to them.

The descriptions of your data should only describe the type of data collected and should not include warnings about how to use the data. Explain any warnings should usage notes or by reference to other properties such as ‘accessRights’.

Using ‘identifier’

You should uniquely identify your data set so that users of your data know exactly which source they’re using.

You should identify your data asset by:

using the identification system your organisation is using (in cases where organisations have a system in place)
using an opaque identifier you’ve created – this should be random numbers rather than sequential or semi-sequential numbers to avoid meaning being implied

Using a meaningless identifier avoids the misunderstanding that comes with applying meaning to identifiers. For example, meaning can change over time. Meaningless identifiers can be genuinely constant things.

For example, identifier:362857580.

You can ensure this meaningless identifier stays unique by keeping a catalogue of all data sets with their identifiers.

Using ‘mediaType’

In a distribution, you should record the file format or encoding method of the data asset being described, since many data sets are (or can be) published in multiple formats (mediaTypes).

For example:

CSV: text/csv
Excel (.xlsx): application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Geopackage: application/geopackage+sqlite3
HTML: text/html
PDF: application/pdf
Word (.docx): application/vnd.openxmlformats-officedocument.wordprocessingml.document

Use the media type that’s most relevant to your data set – for example, so that browsers can decide how to present the underlying data. The media type used is derived from a list of values as defined by Internet Assigned Numbers Authority (IANA) Media Types.

Question 9

9. Make sure your data is used appropriately

Accepted Answer

Using ‘licence’

For protected data such as personal, sensitive or commercial data, you should record information that will help users of the data understand its terms and conditions.

You may want to include the relevant data sharing agreement, legal regulation or certification. This could be a memorandum of understanding (MOU) or Data Protection Impact Assessment (DPIA).

NOTE: The open standards vocabularies for schema.org, Dublin Core and DCAT spell the noun ‘licence’ using the American spelling ‘license’.

For example, licence:Memorandum of Understanding between the Charity Commission for England and Wales and the Office for Students.

When publishing open data, you should label the data you’ve collected with its licence for use. In many cases within government, this will be the Open Government Licence (OGL). You should also link to the licence file to explain what the licence means and how others can use your code and content.

For example, licence: https://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/.

Using ‘accessRights’

You should record the sensitivity of your data so it’s not shared or published in ways it should not be.

You should provide information about who should be able to access the data you’ve collected, and any restrictions including:

whether it’s open (publicly available), commercial or internal (has restricted access)
the handling caveat for the data
the security classification of data

Recording metadata to describe critical data assets

1. Who this guidance is for

2. What critical data assets and metadata are

3. Where to record and store your metadata

4. Making your metadata machine-readable and accessible

5. Metadata you should record

6. Recording time and dates in your metadata

Using ‘created’

7. Record the provenance of your data

Using ‘creator’

8. Help users find, use and identify your data set

Using ‘title’

Using ‘description’

Using ‘identifier’

Using ‘mediaType’

9. Make sure your data is used appropriately

Using ‘licence’

Using ‘accessRights’

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK

Cookies on GOV.UK

1. Who this guidance is for

2. What critical data assets and metadata are

3. Where to record and store your metadata

4. Making your metadata machine-readable and accessible

5. Metadata you should record

6. Recording time and dates in your metadata

Using ‘created’

7. Record the provenance of your data

Using ‘creator’

8. Help users find, use and identify your data set

Using ‘title’

Using ‘description’

Using ‘identifier’

Using ‘mediaType’

9. Make sure your data is used appropriately

Using ‘licence’

Using ‘accessRights’

Is this page useful?

Help us improve GOV.UK

Help us improve GOV.UK