Guidance

Using CSV file format

Consistent and standards-based comma-separated values (CSV) files help make data more effective and easier to share across government.

The comma-separated values (CSV) file format is widely used in government for sharing and receiving data. This guidance is for government employees who publish data. It shares the standard and best practice for creating consistent and interoperable CSV files.

The government recommends the RFC 4180 open standard for CSV files. You can read how to use this standard in the Tabular data standard profile.

RFC 4180 aims to reduce common problems and help with data management. The standard and this guidance focus on the manual process and do not refer to using automated tools, libraries and languages. You can use some of the principles in this document in automated environments.

Using CSV files to meet user needs

When creating a CSV file you should consider the needs of government users such as statisticians, data collection agencies, developers and data engineers.

These users need a consistent CSV file format so they can:

  • make data sets easier to share
  • aggregate data and publish it as part of the UK’s policy on open data, consistent with the principles of the open data charter
  • share data in the open for citizens and government workers
  • make it easier to compare data from different data sets

You should also consider whether your CSV will be read by machines or by humans, and understand how this affects the accessibility needs of your file.

The main benefits of using CSV include the ability to:

  • improve data sharing between government organisations
  • import and export CSV files to most data and analytic solutions
  • provide forward and backwards compatibility and adaptability to changing applications and requirements
  • open smaller files in a simple text editor as CSV uses a plain text format
  • improve data security - any program or malicious code in the CSV will not run when you open the file in a text editor

When not to use CSV files

It may not be appropriate to use the CSV file format if you:

  • have large files that you query several times a day - it may be more suitable to use a different format that offers compression capabilities
  • need greater flexibility to encode additional information in the file beyond the raw data
  • need to use JSON with REST APIs - you could, however, provide a ‘Download as CSV’ option for non-technical users

It is important to note that exporting a spreadsheet to CSV format will:

  • remove any formatting such as font styles, links, borders and formulas
  • only export the current tab - single CSV does not support multiple spreadsheet tabs without reorganising your data

Setting up your CSV file

Before you produce a CSV file, make sure the data is about one subject area. Your columns should relate to each other and not mix unrelated information. For example, you should not put employee records and budget figures in the same CSV file.

When manually setting up a CSV file, or setting up an automated process, you should:

  • include a header row to provide basic information about the file so users or machines do not need to make assumptions about column names
  • use only one header row - if you have multiple header rows, combine or merge them into a single header row
  • use one data type per column, for example do not mix numeric data and text in the same column unless you have a valid reason such as marking a value as ‘null’
  • make your column headers as simple as possible and avoid special characters, for example, ‘time period (mins)’ could become ‘time_period_mins’ or ‘time_period’
  • use commas to separate fields, especially when publishing data - do not use pipes (also known as the vertical bar symbol ‘|’)
  • avoid using line feed (‘\n’) and carriage return (‘\r’) characters as they can cause data import issues
  • save the file in UTF-8 - this is an international standard that supports characters in most scripts and minimises errors in the import and export of files
  • use only one value per cell - do not list multiple values (arrays) in a single cell, for example “value1, value2, value3”

You should also apply metadata to your CSV files. You can learn more by reading the:

Managing large CSV files

The definition of a large CSV file will depend on the department and tool using the file. As a general guide, a CSV file size of a few hundred MB is large for Excel while a CSV file size of a few GB is large for an API or a data management tool. Keep this in mind when choosing which tools or applications to use.

When using large CSV files in Excel, you must make sure the whole file loads properly. If you are using older versions of Excel (97-2003) you should be aware that they have a low row limit of around 65,000 rows. Microsoft has information about Excel specifications and limits.

When CSV files become large, data suppliers might compress them into a ZIP file. If you use a ZIP file, you should add metadata to describe your CSV and add instructions on how to open the ZIP file without causing errors to the contents of the file.

Validating your CSV files

To make sure your CSV files remain accurate and consistent, you should:

  • check the data does not change when opening the file in common applications - for example, some spreadsheet tools remove leading zeros for values such as ‘0008’
  • check the application opening your CSV files recognises the column formats and their content, such as time and date
  • check the characters in the CSV columns to make sure they display correctly
  • provide guidance on any application specific problems you’re aware of
  • make sure your CSV files still open correctly if you export them from an application
  • make sure your chosen tool displays all the rows and columns

Moving or storing data in CSV file format

When creating data sets, you should consider whether it is likely your data set will need to be transferred to or stored in different places. Using a standardised data structure makes moving data easier and helps keep it intact.

For example, the National Archives is legally required to store a range of data sets from government departments. It is easier for users to search and process these data sets if their format is standardised.

Working with multiple database tables

You should not compress multiple database tables into one CSV file. Instead, you could use a format such as CSV on the Web to express the relationships between several separate CSV files, or use a more appropriate file format.

There may be instances where 2 tables hold very similar data, such as first name and last name. In cases like this, it might make sense to merge them into one CSV.

Published 12 March 2021