Developing lexicographic sorting: An example for Urdu.


Collation or lexicographic sorting is essential to develop multilingual computing. This paper presents the challenges faced in developing collation sequence for a language. The paper discusses both theoretical linguistic and practical standardization and encoding related considerations that need to be addressed for languages for which relevant standards and/or solutions have not been defined. The paper also defines the process, by giving the details of the procedure followed for Urdu language, which is the national language of Pakistan and is spoken by more than 100 million people across the world. The paper is oriented towards organizations involved in developing and using collation standards and the localization industry, and not focused on theoretical issues.


Published 1 January 2007