X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Thu, 3 Feb 2011 10:41:42 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com, bug-gnulib AT gnu DOT org Subject: Re: 16-bit wchar_t on Windows and Cygwin Message-ID: <20110203094142.GU2675@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com, bug-gnulib AT gnu DOT org Mail-Followup-To: cygwin AT cygwin DOT com, bug-gnulib AT gnu DOT org References: <201101310304 DOT 42975 DOT bruno AT clisp DOT org> <201102030003 DOT 46763 DOT bruno AT clisp DOT org> <4D49E68C DOT 2030509 AT redhat DOT com> <201102030112 DOT 53179 DOT bruno AT clisp DOT org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <201102030112.53179.bruno@clisp.org> User-Agent: Mutt/1.5.21 (2010-09-15) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Feb 3 01:12, Bruno Haible wrote: > Hi Eric, > > > I was asking: > > > > should wwchar_t (or xwchar_t, but not xchar_t) be 2-bytes on cygwin, but > > unlike the POSIX definition of wchar_t being always 1 character per > > unit, the new type is explicitly documented as being multi-unit on some > > platforms but with sane semantics > > > > or should it always be 4-bytes, where conversion from wchar_t to > > wwchar_t requires some efforts, and where the new type must be used > > everywhere (which means wrapping a lot of APIs), but where you can once > > again assume POSIX semantics of 1 character per unit, simplifying life > > of callers at the expense of converting to the new type > > In the first case we wouldn't need a new type. > > The plan is the second alternative. The goal is *not* to have to extend > each of quotearg.c, regcomp.c, mbchar.h, wc.c, etc. to handle UTF-16 > explicitly with #ifdefs, more variables, and more logic. > > > if it works out, should we also add wwchar_t natively into cygwin? > > More and more Unix platforms offer only UTF-8 locales. One can predict > that in 10 years, all Unix platforms will offer only UTF-8 locales. At this > point wchar_t will be UCS-4 on all these platforms (except AIX). > > The mbrtoc32 function from the C1X API that you pointed to will then be > equivalent to mbrtowwc. > > So, you can view 'wwchar_t' as a temporary measure that will bridge the > gap between the ANSI C Amd. 1 API and the C1X API. Maybe I'm just dense, but isn't wwchar_t equivalent to wint_t on all platforms? On UCS-4 platforms sizeof(wint_t) == sizeof(wchar_t) == 4 because there's no reason to make it bigger. On UCS-2 and UTF-16 platforms sizeof(wint_t) == 4 because it must be able to hold EOF as well. So, why not just use the wint_t type for the time being? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple