It turns out that uppercasing a character is a complicated business. If you get out of the basic ASCII character set, the rules for uppercasing a character and lowercasing a character are actually dependent on the locale in which the application is running.

As a demo application, I am attempting to uppercase the letter 'i' (with a dot) and the letter 'i' (without a dot). Now, in en_US, 'i' (with a dot) uppercases to 'I', and 'i' (without a dot) doesn't exist (but still uppercases to 'I').

But, if I switch to Turkish (tr_TR.UTF-8), 'i' (with a dot) must uppercase to 'İ' (also with a dot) and 'ı' (without a dot) must uppercase to 'I' (also without a dot). Lowercase should reverse these operations.

iİıI --> İİII  (tr_TR.UTF-8)
iİıI --> IİII  (en_US.UTF-8)

Now, I can do this perfectly in C. How can I do it in Haskell? All of the searches that I do point me directly to Data.Char.toUpper, which is not locale-aware. I haven't found any functions that are locale-aware in any way.

Here's a code sample from C. I run it on my Linux machine.

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wctype.h>
#include <string.h>
#include <errno.h>

wchar_t latin_small_sharp_s[5] = {0x00df, 0x00df, 0x0053, 0x0053, 0};
wchar_t turkish_is[5] = {0x0069, 0x0130, 0x0131, 0x0049, 0};

char multibyte_turkish_is[7] = {0x69, 0x01, 0x30, 0x01, 0x31, 0x49, 0};

void print_in_locale (const char *locale, const wchar_t *str, const size_t len) {
  wchar_t *dest = calloc(len * 2, sizeof(wchar_t));
  int i;

  if (!setlocale(LC_CTYPE, locale)) {
    fprintf(stderr, "Locale %s failed with error: %s", locale, strerror(errno));

  for (i = 0; i < len; i++) {
    dest[i] = towupper(str[i]);
  printf("%ls, %ls\n", str, dest);

int main () {
  print_in_locale("de_DE.utf8", latin_small_sharp_s, 5);
  print_in_locale("tr_TR.utf8", turkish_is, 5);
  print_in_locale("de_DE.utf8", turkish_is, 5);

If you saved it to "locale_test.c", you can run it on the command line with...

gcc -o locale_test locale_test.c && ./locale_test
Savanni D'Gerinel
  • 2,209
  • 14
  • 25
  • Did you use Turkish only as an example or do you develop a piece of software targeting Turkey? – Cetin Sert Sep 22 '12 at 00:52
  • 1
    Example. I am working on software that we're going to release multinationally when I started running into this, and then in talking about it on G+ I got a lot of friends, including those who aren't techies, interested in the problem. I had thought that over the weekend I would develop a piece of software that demonstrated a lot of this, but never got the chance. – Savanni D'Gerinel Sep 24 '12 at 14:58

1 Answers1


Use the Data.Text.ICU.toUpper function from the text-icu package.

toUpper :: LocaleName -> Text -> Text

Uppercase the characters in a string.

Casing is locale dependent and context sensitive. The result may be longer or shorter than the original.

  • 31,331
  • 7
  • 73
  • 113
Abhinav Sarkar
  • 22,313
  • 10
  • 78
  • 95
  • That was exactly it! It looks like for most unicode support, I don't need anything beyond the Prelude putStrLn, Data.Text.ICU (for locale-dependent upper and lowercase), and Data.Text (for building unicode strings). Possibly also the unicode codec functions to switch between UTF-8 and internal representation. – Savanni D'Gerinel Sep 24 '12 at 15:09