delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
X-Original-To: | cygwin AT cygwin DOT com |
Delivered-To: | cygwin AT cygwin DOT com |
DMARC-Filter: | OpenDMARC Filter v1.4.1 sourceware.org 8F6C93858D39 |
Authentication-Results: | sourceware.org; dmarc=none (p=none dis=none) |
header.from=SystematicSw.ab.ca | |
Authentication-Results: | sourceware.org; |
spf=none smtp.mailfrom=systematicsw.ab.ca | |
X-Authority-Analysis: | v=2.4 cv=FrgWQknq c=1 sm=1 tr=0 ts=61a1dd12 |
a=T+ovY1NZ+FAi/xYICV7Bgg==:117 a=T+ovY1NZ+FAi/xYICV7Bgg==:17 | |
a=IkcTkHD0fZMA:10 a=CCpqsmhAAAAA:8 a=fFEOjooe64AjwK5xVnUA:9 a=QEXdDO2ut3YA:10 | |
a=ul9cdbp4aOFLsgKbc677:22 | |
Message-ID: | <528c7bd3-e39a-5b7a-5819-5a6b4e3c71c5@SystematicSw.ab.ca> |
Date: | Sat, 27 Nov 2021 00:24:02 -0700 |
MIME-Version: | 1.0 |
User-Agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 |
Thunderbird/91.3.2 | |
Subject: | Re: raise(-1) has stopped returning an error recently |
To: | cygwin AT cygwin DOT com |
References: | <YZsoj6UvpF6pcbtt AT slk1 DOT local DOT net> |
<YZtwMZ1LUbx+b5+s AT calimero DOT vinschen DOT de> | |
<YZuVy5+nbzPtiqdw AT calimero DOT vinschen DOT de> <YZyl69ODRcBVnMed AT slk1 DOT local DOT net> | |
<YZy5bRsZuulb6FUV AT calimero DOT vinschen DOT de> | |
<42c9bb90-dd78-edfa-99ff-f65f7e000956 AT SystematicSw DOT ab DOT ca> | |
<YZ1tAfzwlW8C84z4 AT slk1 DOT local DOT net> <YZ4FGpEDDar45HC7 AT calimero DOT vinschen DOT de> | |
<643c1cb7-9b18-25cf-62b0-8085c8fab137 AT Shaw DOT ca> | |
<YZ+HkgPIwmCuTcJr AT calimero DOT vinschen DOT de> | |
From: | Brian Inglis <Brian DOT Inglis AT SystematicSw DOT ab DOT ca> |
Organization: | Systematic Software |
In-Reply-To: | <YZ+HkgPIwmCuTcJr@calimero.vinschen.de> |
X-CMAE-Envelope: | MS4xfFmeBBPnHPIZbAAlXJjwIQfk0GJieEdAX0oGbMv4FYULRgeLQJISOvfEfD995CbaQSOERK0g04qFisWtUslAsmCUnJc3Ud2B4XwjQc+iFBZ2ACb2rxHk |
wJvxTITty5ZkHxIlOoP0wVrl439xdmJQTL4jtvvdfItJZaX/mi6H3jwjQXJJVOZEl4/nlzZIAWppdd7Yk42vvBUj5ZDEkUctINM= | |
X-Spam-Status: | No, score=-1161.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, |
KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, | |
RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, | |
SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.4 | |
X-Spam-Checker-Version: | SpamAssassin 3.4.4 (2020-01-24) on |
server2.sourceware.org | |
X-BeenThere: | cygwin AT cygwin DOT com |
X-Mailman-Version: | 2.1.29 |
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
Reply-To: | cygwin AT cygwin DOT com |
Errors-To: | cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com |
Sender: | "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com> |
X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 1AR7Ohxb029359 |
On 2021-11-25 05:54, Corinna Vinschen via Cygwin wrote: > On Nov 24 11:01, Brian Inglis via Cygwin wrote: >> On 2021-11-24 02:25, Corinna Vinschen via Cygwin wrote: >>>> On Tue, Nov 23, 2021 at 11:18:25AM -0700, Brian Inglis wrote: >>>>> Do Cygwin and/or Windows support surrogate pairs in UTF-8? >>> >>> You mean UTF-16. UTF-8 doesn't know surrogate pairs, UTF-16 does. >>> Originally there was UCS-2, 16 bits, with only 65536 code points. >>> However, Unicode left the BMP already with version 2.0 in 1996, so >>> UTF-16 and surrogate pairs became necessary. Windows as well as Cygwin >>> support them. >> >> How does Cygwin support UTF-16 locales with surrogate pairs? > > UTF-16 locales? There's no such thing. UTF-16 is just the 16 bit > representation for Unicode, and as such, is independent of the locale. > On the user side, Cygwin only supports UTF-8 as Unicode representation. > Internally you can then convert them to wchar_t which is UTF-16. > >> Are they the "native" locales inherited from Windows if others are not >> specified e.g. UTF-8, some OEM SBCS or MBCS? > > Just try `locale -av' and you'll see all supported locales and their > respective default codeset. All of them can be used with .utf8 > specifier to use UTF-8 instead of the default codeset. Some of them > use UTF-8 as default codeset anyway, e. g., fa_IR or yo_NG. > >>>> There are 3 tests in surrogate-pair and only the 3rd one failed. So I guess >>>> surrogate pairs in UTF-8 "mostly work". >>> >>> UTF-16. The surrogate stuff is evil at times. Have a look at the >>> __utf8_wctomb function in >>> https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdlib/wctomb_r.c >>> Lone surrogate halfs in an input stream are a problem, for instance. >> >> Thus the confusion with grep surrogate pair tests which appear to be running >> under a UTF-8 locale: see attached surrogate pair extract from cygport >> --debug grep.cygport check. > > An STC in plain C might be helpful. I think I might finally have got the point of the test, not knowing much about legacy UTF-16 UCS encoding nor surrogate pairs. From what I can see: 𐐅 U+010405 f0 90 90 85 DESERET CAPITAL LETTER LONG OO fails to match itself, presumably others do also. Presumably this is converted internally on some platforms, including Cygwin, to a UTF-16 surrogate pair, and a grep comparison fails, although a bash comparison succeeds. $ printf '\U10405\n' | iconv -f utf-8 -t utf-16be | xxd -g2 00000000: d801 dc05 000a $ printf '\U10405\n' > t $ grep -f t t; echo $? 1 $ oo=`printf '\U10405\n'`; [ $oo = $oo ] && echo same || echo diff same -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.] -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright 2019 by DJ Delorie | Updated Jul 2019 |