X-Recipient: archive-cygwin@delorie.com
X-Spam-Check-By: sourceware.org
Date: Thu, 3 Feb 2011 10:41:42 +0100
From: Corinna Vinschen <vinschen@redhat.com>
To: cygwin@cygwin.com, bug-gnulib@gnu.org
Subject: Re: 16-bit wchar_t on Windows and Cygwin
Message-ID: <20110203094142.GU2675@calimero.vinschen.de>
Reply-To: cygwin@cygwin.com, bug-gnulib@gnu.org
Mail-Followup-To: cygwin@cygwin.com, bug-gnulib@gnu.org
References: <201101310304.42975.bruno@clisp.org> <201102030003.46763.bruno@clisp.org> <4D49E68C.2030509@redhat.com> <201102030112.53179.bruno@clisp.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <201102030112.53179.bruno@clisp.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie.com@cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com

On Feb  3 01:12, Bruno Haible wrote:
> Hi Eric,
> 
> > I was asking:
> > 
> > should wwchar_t (or xwchar_t, but not xchar_t) be 2-bytes on cygwin, but
> > unlike the POSIX definition of wchar_t being always 1 character per
> > unit, the new type is explicitly documented as being multi-unit on some
> > platforms but with sane semantics
> > 
> > or should it always be 4-bytes, where conversion from wchar_t to
> > wwchar_t requires some efforts, and where the new type must be used
> > everywhere (which means wrapping a lot of APIs), but where you can once
> > again assume POSIX semantics of 1 character per unit, simplifying life
> > of callers at the expense of converting to the new type
> 
> In the first case we wouldn't need a new type.
> 
> The plan is the second alternative. The goal is *not* to have to extend
> each of quotearg.c, regcomp.c, mbchar.h, wc.c, etc. to handle UTF-16
> explicitly with #ifdefs, more variables, and more logic.
> 
> > if it works out, should we also add wwchar_t natively into cygwin? 
> 
> More and more Unix platforms offer only UTF-8 locales. One can predict
> that in 10 years, all Unix platforms will offer only UTF-8 locales. At this
> point wchar_t will be UCS-4 on all these platforms (except AIX).
> 
> The mbrtoc32 function from the C1X API that you pointed to will then be
> equivalent to mbrtowwc.
> 
> So, you can view 'wwchar_t' as a temporary measure that will bridge the
> gap between the ANSI C Amd. 1 API and the C1X API.

Maybe I'm just dense, but isn't wwchar_t equivalent to wint_t on all
platforms?  On UCS-4 platforms sizeof(wint_t) == sizeof(wchar_t) == 4
because there's no reason to make it bigger.  On UCS-2 and UTF-16
platforms sizeof(wint_t) == 4 because it must be able to hold EOF as
well.  So, why not just use the wint_t type for the time being?


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

