X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4852D3858403 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1637844929; bh=dbM21zcFmd4iOBnLIKN6UnABBNuNulLNXmnbXtfXTW0=; h=Date:To:Subject:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=nbA0OIDa2TIjpHQUD7TP6lrE9QZ3XgQCUsvjRyySWcbJLjdvqNjOtoEBGTXN5bFxr hSQvtBOQuLp2JEROFyWmqdibV6D07AWDQw6HghXX/GY8vZRI5REf1tY0xLDYWq+yr9 d62ALwSNtzHBU2UjUj86bhFD1FLTwAGJaRUGz+5w= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C954B385843B Date: Thu, 25 Nov 2021 13:54:42 +0100 To: cygwin AT cygwin DOT com Subject: Re: raise(-1) has stopped returning an error recently Message-ID: Mail-Followup-To: cygwin AT cygwin DOT com References: <42c9bb90-dd78-edfa-99ff-f65f7e000956 AT SystematicSw DOT ab DOT ca> <643c1cb7-9b18-25cf-62b0-8085c8fab137 AT Shaw DOT ca> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <643c1cb7-9b18-25cf-62b0-8085c8fab137@Shaw.ca> X-Provags-ID: V03:K1:O69r4blocL2mhs4Zc2CibcTGt4ZETpDI9ENebpi7W2DQLyVy/Dt +LdlORXlkp0bOIISUTIPC+DKCulcAN9yvdAxioNLdel+bsKtmL2SwGiJjNbiym3l0/Bmvi0 D/7/SEEXK1sSZSg4m56vSzK8a4M3It3yW+fjFVT6Z/VPg/wPFQigMuBMU2X3zlunk1XEg4C skM+Gvqa5+zxPdwkdL99g== X-UI-Out-Filterresults: notjunk:1;V03:K0:bYwmlDtpJ4s=:fus8fCR/f1ERpQ0fbvqDvQ fre3K50mrQV2cxQf7SBnXUFXmTX4MVkGNbx2oMdSO7Zl6jNHLxnK39FrVtUpKWlJwZlwczPkS 8JhV8ytKOkANLf8dOI0yZ/BZ+u9D22iyWt5L5JfWSxn1VC0YZ7ZFg/S3Ck49braleEnbGrIQ7 NvPl4xbbu9vCPGhPmBLQGK5tSBAV7Tt6FxCgS/WZa+dIasHhO9hIpQN2YT/24zfqeC/vzdaRV fRWVBG1s1LArf+Itk+FHy9QksSjSeNDPFv64ZKNzTYjfs5aN15M0+GYBEPA92uufHnMcwsfYa ZkTlSfH2+Yo/9oShe4wOtAi2uD+VgO05Za3bDEV5oR0fOqhYrLFJv4FNxq0LEgrTqtxWgEv8t VH1OgRz53GzcBjvIKZoUOgCA09WEA/hBilBtrwI16eIoFrCE7Ut5maoTwxB7legGqImiLCAcN ODnnABJAzxAaS4nv0EJJb/AWcDX2I1Gh0c47/1Bvf3T2rsK4ffzgfC2Y/sPc6DIHNlIqo2EMf REpKl2dwIS7xWOQgV5uJkx0TdlqY2fI6OY1NK3ECVLi9Pw8I887VV3b5QeRUGQd9avLZHe6Of x1X5GVAXyQFQKghBREy/r+94M+el5BK5K7IHypT1WUxZKDpbqIWQ9hLmEVtCthfjlTN32vu8a PRVwM3saTMV4avttS2tFljg6Xds/VagUwXHJZfRBVAseObQEjPIW7PAqdnLkq2aX+3UnH7wbR c9vigueUnci/SH+B X-Spam-Status: No, score=-99.4 required=5.0 tests=BAYES_00, GOOD_FROM_CORINNA_CYGWIN, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NEUTRAL, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Corinna Vinschen via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Corinna Vinschen Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" On Nov 24 11:01, Brian Inglis via Cygwin wrote: > On 2021-11-24 02:25, Corinna Vinschen via Cygwin wrote: > > > On Tue, Nov 23, 2021 at 11:18:25AM -0700, Brian Inglis wrote: > > > > Do Cygwin and/or Windows support surrogate pairs in UTF-8? > > > > You mean UTF-16. UTF-8 doesn't know surrogate pairs, UTF-16 does. > > Originally there was UCS-2, 16 bits, with only 65536 code points. > > However, Unicode left the BMP already with version 2.0 in 1996, so > > UTF-16 and surrogate pairs became necessary. Windows as well as Cygwin > > support them. > > How does Cygwin support UTF-16 locales with surrogate pairs? UTF-16 locales? There's no such thing. UTF-16 is just the 16 bit representation for Unicode, and as such, is independent of the locale. On the user side, Cygwin only supports UTF-8 as Unicode representation. Internally you can then convert them to wchar_t which is UTF-16. > Are they the "native" locales inherited from Windows if others are not > specified e.g. UTF-8, some OEM SBCS or MBCS? Just try `locale -av' and you'll see all supported locales and their respective default codeset. All of them can be used with .utf8 specifier to use UTF-8 instead of the default codeset. Some of them use UTF-8 as default codeset anyway, e. g., fa_IR or yo_NG. > > > There are 3 tests in surrogate-pair and only the 3rd one failed. So I guess > > > surrogate pairs in UTF-8 "mostly work". > > > > UTF-16. The surrogate stuff is evil at times. Have a look at the > > __utf8_wctomb function in > > https://sourceware.org/git/?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdlib/wctomb_r.c > > Lone surrogate halfs in an input stream are a problem, for instance. > > Thus the confusion with grep surrogate pair tests which appear to be running > under a UTF-8 locale: see attached surrogate pair extract from cygport > --debug grep.cygport check. An STC in plain C might be helpful. Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple