Windows – Is it possible to set “locale” of a Windows application to UTF-8

code-pagelocalewindows

We know there is an application called AppLocale, which can change the code page of non-Unicode applications, to solve text display problems.

But there is a program whose right display code page is UTF-8, which means its text should be shown as UTF-8, but instead Windows displays it as the native code page and makes the text unreadable. It seems funny, because there are almost all countries and regions, but without UTF-8. I think it is a bug, because the programmers may use English and ignore testing non-English text display issues. I don't think the producer will fix it and I wanna fix it myself.

Is it possible to set non-Unicode output as UTF-8 by using software like AppLocale? Default non-Unicode output is native code page? How can I set the native code page to UTF-8?

Best Answer

  • Previously it was not possible because

    Microsoft claimed a UTF-8 locale might break some functions (a possible example is _mbsrev) as they were written to assume multibyte encodings used no more than 2 bytes per character, thus until now code pages with more bytes such as GB 18030 (cp54936) and UTF-8 could not be set as the locale.

    https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8

    However there's a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox since Windows 10 insider build 17035 for setting the locale code page to UTF-8

    Beta: Use Unicode UTF-8 for worldwide language support

    See also

    That said, the support is still buggy at this point


    Update:

    Microsoft has also added the ability for programs to use the UTF-8 locale without even setting the UTF-8 beta flag above. You can use the /execution-charset:utf-8 or /utf-8 options when compiling with MSVC or set the ActiveCodePage property in appxmanifest

    You can also use UTF-8 locale in older Windows versions by linking with the appropriate C runtime

    Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use "UTF-8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".utf8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.

    ...

    To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.

    UTF-8 Support

  • Related Question