© Crown copyright 2022
This publication is licensed under the terms of the Open Government Licence v3.0 except where otherwise stated. To view this licence, visit nationalarchives.gov.uk/doc/open-government-licence/version/3 or write to the Information Policy Team, The National Archives, Kew, London TW9 4DU, or email: firstname.lastname@example.org.
Where we have identified any third party copyright information you will need to obtain permission from the copyright holders concerned.
This publication is available at https://www.gov.uk/government/publications/open-standards-for-government/cross-platform-character-encoding-profile
1. Summary of the standard’s use for government
Unicode is based on the ASCII character set, but expands ASCII to include characters for most written languages.
- is one of the encoding forms for Unicode
- encodes all Unicode characters without changing the ASCII code
This makes UTF-8 flexible for a wide range of uses. For example, the default character encoding in HTML-5 is UTF-8.
2. How this standard meet user needs
Users of this standard include:
- publishers of government data
- data scientists
- data analysts
UTF-8 is an international standard. By using it you can read, write, store and exchange text that remains stable over time and across different systems.
You will also:
- prevent accidental or unanticipated corruption of text as it transfers between systems
- save operational costs by making it easier to find and fix errors in the text
- have accurately translated languages moving between systems
- keep file sizes smaller
3. How to use the standard
To use UTF-8 you need to:
- save text in UTF-8 encoding to apply it to your content
- declare the character encoding, for example, W3 has an example of declaring encodings in HTML
- check your server has the correct HTTP declarations so that they do not override your encoding
Read the W3.org article on migrating to Unicode for more information.