Mail Archives: cygwin/2009/08/21/03:49:55
On Aug 20 21:43, Andy Koppe wrote:
> One fairly important character encoding not yet supported by Cygwin
> 1.7 is KOI8. Well, two actually, because there are slightly different
> versions for Russian and Ukrainian: KOI8-R and KOI8-U, aka Windows
> codepages 20866 and 21866. Apparently they're de-facto standards for
> Unix machines and the in the former Soviet Union. (Windows uses
> CP1251, whereas ISO-8859-5 (Cyrillic) never caught on.)
>
> Cygwin's Midnight Commander actually uses KOI8 if the locale is set to
> "ru" or "uk", even if a charset is specified explicitly, e.g.
> "ru.CP1251". Hence you get gibberish where a helpful hint in the
> user's language should be. (Of course that's primarily a shortcoming
> in mc.)
>
> Anyway, to help support them, the attached patch adds the KOI8
> charsets to newlib's Unicode conversion and ctype tables. I took the
> conversion tables from iconv and adapted the ctype tables from the
> CP1251 version. Since KOI8 has printable characters in the C1 range
> from 0x80 to 0x9F, it seems easiest to treat them as Windows
> codepages.
>
> To complete support, "KOI8-R" and "KOI8-U" would need to be recognised
> in _setlocale_r and mapped to codepages 20866 and 21866.
I'd suggest to add the missing code to loadlocale() (the internally
used charset should be set to "CP20866"/"CP21866", but it seems you know
this already) and send the entire patch, together with a ChangeLog
entry, to the newlib list. If you could base it on my pending proposal
to make the charset case insensitive
http://sourceware.org/ml/newlib/2009/msg00840.html, that would be great.
This patch also requires a minor patch to Cygwin, which can be applied
as ovious after the newlib change has gone in:
Index: strfuncs.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/strfuncs.cc,v
retrieving revision 1.33
diff -u -p -r1.33 strfuncs.cc
--- strfuncs.cc 30 Jun 2009 21:18:43 -0000 1.33
+++ strfuncs.cc 21 Aug 2009 07:48:19 -0000
@@ -339,6 +339,8 @@ __set_charset_from_codepage (UINT cp, ch
case 1256:
case 1257:
case 1258:
+ case 20866:
+ case 21866:
__small_sprintf (charset, "CP%u", cp);
return __cp_mbtowc;
case 28591:
Thanks,
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -