Guidance

Encoding characters

Updated 9 August 2022

Use UTF-8, an encoding form for Unicode character sets, for government digital services and technology.

1. Summary of the standard’s use for government

Unicode is based on the ASCII character set, but expands ASCII to include characters for most written languages.

UTF-8:

  • is one of the encoding forms for Unicode
  • encodes all Unicode characters without changing the ASCII code

This makes UTF-8 flexible for a wide range of uses. For example, the default character encoding in HTML-5 is UTF-8.

The government chooses standards using the open standards approval process and the Open Standards Board has final approval. Read more about the approval process for cross-platform character encoding.

2. How this standard meet user needs

Users of this standard include:

  • publishers of government data
  • data scientists
  • data analysts
  • developers

UTF-8 is an international standard. By using it you can read, write, store and exchange text that remains stable over time and across different systems.

You will also:

  • prevent accidental or unanticipated corruption of text as it transfers between systems
  • save operational costs by making it easier to find and fix errors in the text
  • have accurately translated languages moving between systems
  • keep file sizes smaller

3. How to use the standard

To use UTF-8 you need to:

Read the W3.org article on migrating to Unicode for more information.