Guidance

Search engine optimisation (SEO) for data publishers: Best practice guide

Published 28 January 2020

Introduction

This best practice guide provides advice for data publishers wanting to improve the findability of their metadata (and therefore data) through search engines. The recommendations are based on research carried out by the Geo6[footnote 1] on behalf of the Geospatial Commission for the Data Discoverability project. While the project was focused on geospatial data, the principles set out below can be applied to any kind of data and data publisher.

Note that these recommendations are based exclusively on search engine optimisation (SEO) best practice and do not take into account other factors that you may need to consider (such as compliance with metadata standards, industry norms or organisation culture). You should assess the risks and benefits of each recommendation in your own context.

Best practice guide

1. Fill out all metadata fields on data portals

Why it’s important

The more relevant information that exists about a webpage, the easier it is for search engines to understand what the page is about. This means that the search engine can rank the page more appropriately in search results. Where information such as a title or abstract is missing, the search engine has less information to work with and will be less likely to rank it highly.

What it means

  • Make sure all the metadata fields available to you contain accurate, relevant information

  • No field should be left empty

2. Keep page titles no longer than 50-60 characters

Why it’s important

If they are any longer than this they will be cut short in search engine results pages. A few examples of this are shown below.

Examples of page titles cut short in search engine results pages

Examples of page titles cut short in search engine results pages

This could mean that key information about your page is not shown. If users do not understand what your page is about from search results they will be less likely to click through.

What it means

  • Keep titles short but make the most of the characters available by using keywords that help users find your data

  • Front-load the title with the most important keywords to maximise their impact on SEO

  • Carry out A/B tests and use tools such as Google Trends to identify which keywords have the biggest impact on the number of people find and using your data

3. Optimise the content of abstracts

Why it’s important

As outlined above, the more relevant information that exists about a webpage, the easier it is for search engines to understand the content and rank it appropriately. If you include confusing or inaccurate information it can negatively affect the page’s position in search results. Search engines may interpret duplicated words or content within the page as an artificial attempt to improve the SEO, and could penalise the page as a result.

Examples of metadata page content not optimised for search engines

Examples of metadata page content not optimised for search engines

What it means

  • Enhance all textual fields with the suitable keywords, front-loading them with the most important ones

  • Be as informative as possible but only include information relevant to the page and avoiding repeating key words

  • Pay particular attention to the first 120 characters of any abstract as this will appear as a content preview in search results (alongside the title)

  • Carry out A/B tests and use tools such as Google Trends to identify which keywords have the biggest impact and the number of people finding and using your data

4. Do not include lists of keywords

Why it’s important

Search engines are very good at recognising artificial attempts to improve SEO and will penalise pages that list random keywords. To be effective, keywords must form part of the core content of a page and should be embedded in full sentences.

Examples of a metadata page that includes a list of keywords

Examples of a metadata page that includes a list of keywords

What it means

  • Make sure all page content is written in natural, full sentences

  • Never use non-sensical text or word strings

5. Check whether you can influence the URL

Why it’s important

Research shows that pages with long URLs are more likely to appear low down in Google search results. To avoid this, URLs should ideally be no longer than 50-60 characters,

What it means

  • You may not have any control over the length of a URL on your metadata pages, but should check whether you can influence any part of it – for example, it might be generated using the dataset title. If this is the case, you should aim to keep it short and optimise it using relevant keywords

• Avoid low-value words such as conjunctions and prepositions where possible.

Example of URL generated using the dataset title

Example of URL generated using the dataset title

6. Avoid special characters where these are not displayed correctly

Why it’s important

Search engines penalise webpages containing information that does not make sense. If you have published metadata on a portal that does not recognise your special characters (such as copyright or trademark symbols) this will have a negative impact on the page’s ranking.

Examples of unrecognised special characters

Examples of unrecognised special characters

What it means

  • Check how special characters are displayed on websites where you publish your metadata

  • If special characters are displayed incorrectly you should remove them

  • If you still need to convey the sense of character (for example, to convey copyright or trademarking) you should do this by explaining it within the text

7. Keep the same URL if your data is updated

Why it’s important

Pages gain authority with search engines over time. If you create a new page, it will take time for it to build up that authority with the search engine and it will usually be ranked lower in search results than a similar page that has been available for longer.

What it means

• If you have metadata that is regularly updated (for example, if your data is refreshed every few months) you should reuse the same page (and URL) whenever possible

• Avoid creating a new page each time

8. Remove out-of-date pages

Why it’s important

The more data that is published, the harder it is for users to filter out the noise and find the source they really want. Search engines view recently updated pages more positively so pages that are out of date, or not actively managed, are less likely to be seen by users. If they are not seen by users they do not add value, but simply add to the long tail of results.

What it means

  • Design processes for managing the metadata you publish, ensuring it is regularly reviewed and removed if it is no longer useful or relevant to users

  • If possible, set up a permanent redirect for any pages removed so users are directed to other relevant content and do not encounter broken links

  • Where older metadata must be maintained (for example, where it refers to a snapshot in time or is part of a series), include a link to where users can find the most recent version of the data

9. Avoid duplication of metadata

Why it’s important

Duplication is another issue that makes it harder for users to find the source that best meets their needs. There are two main types of duplication:

1. Similar metadata available from multiple publishers

example of similar metadata available from multiple suppliers

example of similar metadata available from multiple suppliers

2. Identical metadata from the same publisher, available in more than one place:

 identical metadata from the same publisher, available in more than one place

identical metadata from the same publisher, available in more than one place

Source: Data.gov.uk

 identical metadata from the same publisher, available in more than one place

identical metadata from the same publisher, available in more than one place

Source: BGS Website

What it means

  • Avoid unnecessarily duplicating metadata for data owned by other publishers

  • Avoid duplicating your own metadata across multiple portals as one high quality metadata record will be ranked more highly by search engines than several similar on different platforms

10. Use tools and tests to understand users

Why it’s important

While there are many common principles for SEO, the target audience for your organisation will be more nuanced than this guide. What works for one organisation may be less impactful for another so you must understand your users to know which levers to pull.

What it means

  • Use tools such as Google Analytics and Google Search Console (or equivalents) to understand the behaviour of your users and make evidence-based decisions about how to reach your target audience

  • Use techniques such as A/B tests, user research interviews, user surveys and observations to inform your actions - card sorting exercises may be a good way to test whether users understand the terminology and keywords you are using

Example of data available via Google Analytics

Example of data available via Google Analytics

11. Use tools to identify the best keywords for your subject

Why it’s important

To choose the best keywords for the SEO of your webpages you need to understand:

  • which keywords are most popular among your target audience

  • how competitive those keywords are (i.e. how many other web pages are also trying to rank highly using those words)

What it means

  • Use research tools such as Google Trends to check search volumes for different keywords

  • If appropriate, use other tools (usually commercial) to identify the highest ranking synonyms for your keywords and understand how difficult it is to rank for your preferred terms

Example of how Google Trends can help inform which key words to use

Example of how Google Trends can help inform which key words to use

12. Apply these recommendations to all your pages

Why it’s important

Applying the SEO techniques above to one webpage should have a positive impact on its findability. However, it will be negatively impacted by pages on the same site that have a low SEO value. This is because search engines treat website domains as single domains.

What it means

  • Make sure all pages under your control are optimised for search engines using the techniques above

  • If there are pages with poor SEO on the same website that are not owned by you, consider raising it with the organisation responsible or share this guide

  1. The Geo6 consists of British Geological Survey, The Coal Authority, HM Land Registry, Ordnance Survey, UK Hydrographic Office and the Valuation Office Agency.