delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/02/02/15:28:31

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,TW_EG,T_TO_NO_BRKTS_FREEMAIL
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <20110202163516.GI2675@calimero.vinschen.de>
References: <20110202122102 DOT GD2675 AT calimero DOT vinschen DOT de> <201102021229 DOT 04623 DOT bruno AT clisp DOT org> <201102021702 DOT 57387 DOT bruno AT clisp DOT org> <20110202162801 DOT GH2675 AT calimero DOT vinschen DOT de> <20110202163516 DOT GI2675 AT calimero DOT vinschen DOT de>
Date: Wed, 2 Feb 2011 20:28:17 +0000
Message-ID: <AANLkTinVjgOWPar+8prQA2aE4FphJcm-Y1oq3c1D_wta@mail.gmail.com>
Subject: Re: 16-bit wchar_t on Windows and Cygwin
From: Andy Koppe <andy DOT koppe AT gmail DOT com>
To: cygwin AT cygwin DOT com, bug-gnulib AT gnu DOT org, bug-coreutils AT gnu DOT org
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On 2 February 2011 16:35, Corinna Vinschen wrote:
> On Feb =C2=A02 17:28, Corinna Vinschen wrote:
>> On Feb =C2=A02 17:02, Bruno Haible wrote:
>> > But if you say that the application should convert UTF-16 surrogates
>> > to UTF-32 before calling iswalpha: That's certainly a requirement
>> > for Cygwin 1.7.x application that want to support the entire Unicode
>> > character set. But it's outside of POSIX, and many GNU programs will
>> > not want to include this added complexity. Just try to apply this
>> > suggestion to gnulib's quotearg.c, then estimate the time someone
>> > would need to apply it also to regcomp.c, strftime.c, mbscasestr.c,
>> > coreutils/src/wc.c, and so on.
>>
>> Cygwin's regcomp is taken from FreeBSD and is UTF-16 capable, including
>> surrogate handling. =C2=A0It only required two changes in the code.
>
> Btw., I would be sure glad if Cygwin would use a wchar_t of 4 bytes as
> well. =C2=A0The problem is that this requires too many changes at once to
> work right, and it would introduce a lot of backward compatibility
> problems which would have to be handled.

Cygwin 1.7 might have been a good point for that change, because the
lack of proper locale and charset support in previous versions meant
that backward compatibility was much less of a concern than it is now.
But it's a difficult change indeed, and it's not entirely clear that
it's worthwhile. I guess 64-bit Cygwin (if or when it happens) might
be the next opportunity.

> If only the one's who decided that wchar_t in Cygwin should have the
> same size as WCHAR_T in the underlying Windows would have thought twice
> about the implications...

Windows Unicode support was introduced with Windows NT in 1993,
whereas Unicode was only extended beyond 16 bits with version 2.0 in
1996. Cygwin was first released the year before. If the Unicode
extension was a consideration at all (which I'd doubt), wchar_t !=3D
WCHAR probably seemed far more daunting than having to deal with
surrogates at some point down the line.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019