delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/08/21/03:49:55

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Fri, 21 Aug 2009 09:49:27 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: KOI8
Message-ID: <20090821074927.GD32408@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <416096c60908201343g6134c93ao3f4646f6e3fc0dfe AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <416096c60908201343g6134c93ao3f4646f6e3fc0dfe@mail.gmail.com>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Aug 20 21:43, Andy Koppe wrote:
> One fairly important character encoding not yet supported by Cygwin
> 1.7 is KOI8. Well, two actually, because there are slightly different
> versions for Russian and Ukrainian: KOI8-R and KOI8-U, aka Windows
> codepages 20866 and 21866. Apparently they're de-facto standards for
> Unix machines and the  in the former Soviet Union. (Windows uses
> CP1251, whereas ISO-8859-5 (Cyrillic) never caught on.)
> 
> Cygwin's Midnight Commander actually uses KOI8 if the locale is set to
> "ru" or "uk", even if a charset is specified explicitly, e.g.
> "ru.CP1251". Hence you get gibberish where a helpful hint in the
> user's language should be. (Of course that's primarily a shortcoming
> in mc.)
> 
> Anyway, to help support them, the attached patch adds the KOI8
> charsets to newlib's Unicode conversion and ctype tables. I took the
> conversion tables from iconv and adapted the ctype tables from the
> CP1251 version. Since KOI8 has printable characters in the C1 range
> from 0x80 to 0x9F, it seems easiest to treat them as Windows
> codepages.
> 
> To complete support, "KOI8-R" and "KOI8-U" would need to be recognised
> in _setlocale_r and mapped to codepages 20866 and 21866.

I'd suggest to add the missing code to loadlocale()  (the internally
used charset should be set to "CP20866"/"CP21866", but it seems you know
this already) and send the entire patch, together with a ChangeLog
entry, to the newlib list.  If you could base it on my pending proposal
to make the charset case insensitive
http://sourceware.org/ml/newlib/2009/msg00840.html, that would be great.

This patch also requires a minor patch to Cygwin, which can be applied
as ovious after the newlib change has gone in:

Index: strfuncs.cc
===================================================================
RCS file: /cvs/src/src/winsup/cygwin/strfuncs.cc,v
retrieving revision 1.33
diff -u -p -r1.33 strfuncs.cc
--- strfuncs.cc	30 Jun 2009 21:18:43 -0000	1.33
+++ strfuncs.cc	21 Aug 2009 07:48:19 -0000
@@ -339,6 +339,8 @@ __set_charset_from_codepage (UINT cp, ch
     case 1256:
     case 1257:
     case 1258:
+    case 20866:
+    case 21866:
       __small_sprintf (charset, "CP%u", cp);
       return __cp_mbtowc;
     case 28591:


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019