Mail Archives: cygwin/2023/08/01/12:30:08
X-Recipient: | archive-cygwin AT delorie DOT com
|
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org E88FA3858426
|
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
|
| s=default; t=1690907404;
|
| bh=rDlVCLi5oO1Yr9xKxFubOJyy47Z7PZCWD7wcCYs3qbk=;
|
| h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
|
| List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
|
| From;
|
| b=LOf9Rfy0DP4a3xbKoukELjdNOEoW9E219Y6WLmEShlszAw9eRrcSXyehfUvGqj5xG
|
| YUreh6dNCBxk/hykkVPDv5aywAUhLPB1tO3rJ7oN9rLMZhIF9dQFTS7loH+yL2FvVe
|
| MWp4S21dyOZ+3WhNOSEbElMe2/w65OvPBXhikBEs=
|
X-Original-To: | cygwin AT cygwin DOT com
|
Delivered-To: | cygwin AT cygwin DOT com
|
DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org 315BD3858D28
|
X-Authority-Analysis: | v=2.4 cv=VbHkgXl9 c=1 sm=1 tr=0 ts=64c932e6
|
| a=DxHlV3/gbUaP7LOF0QAmaA==:117 a=DxHlV3/gbUaP7LOF0QAmaA==:17
|
| a=IkcTkHD0fZMA:10 a=86U1H9NdAAAA:8 a=b4LDLZbEAAAA:8 a=2QSLavsyAAAA:8
|
| a=fPltfn_WMURoN32c8vYA:9 a=QEXdDO2ut3YA:10 a=TVMXX37u5fMA:10
|
| a=XWJKPjo1ZQsA:10 a=irUpglS4_zsA:10 a=iY9fdrOHEU0A:10
|
| a=3ocqyLwIp7_IhjRobAq6:22 a=20T61YgZp4ItGotXEy2O:22 a=9H_80fVQ3bbXSWzY4Kdq:22
|
Message-ID: | <078cd0e8-0db9-cb3c-e1e4-227b2f55a4ae@Shaw.ca>
|
Date: | Tue, 1 Aug 2023 10:29:25 -0600
|
MIME-Version: | 1.0
|
User-Agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
|
| Thunderbird/102.13.0
|
Subject: | Re: character class "alpha"
|
To: | cygwin AT cygwin DOT com
|
References: | <3884636 DOT 3uDm00564X AT nimes> <ZMfzbOOJth8Mk+rJ AT calimero DOT vinschen DOT de>
|
| <ZMf7aqxU8awRQM4v AT calimero DOT vinschen DOT de> <4474610 DOT kIfH5X4irW AT nimes>
|
| <ZMgjuHZjuKbnGpR6 AT calimero DOT vinschen DOT de>
|
Organization: | Inglis
|
In-Reply-To: | <ZMgjuHZjuKbnGpR6@calimero.vinschen.de>
|
X-CMAE-Envelope: | MS4xfO88uWnCx5iclgbJIct9ielxmzk0ktubPXJyf28fdtoR6zJ2UCwb4/cKUphxdtgRhZA0Hzk1j7kHBa+TXJ7t5woigJZ34sD1fn1T9lsPYZVxj9DhxZOm
|
| /HuTWfh6ARRghLrhkTf68gzk7PnuiEQtEgqhjfVk4IHDjCb36ymw3dyZ1j82Jx/wM4CahqeIsY719WbKhOEGZmbc6cStFjSKgww=
|
X-Spam-Status: | No, score=-3.3 required=5.0 tests=BAYES_00, DKIM_SIGNED,
|
| DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, NICE_REPLY_A,
|
| RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP,
|
| T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
|
X-Spam-Checker-Version: | SpamAssassin 3.4.6 (2021-04-09) on
|
| server2.sourceware.org
|
X-BeenThere: | cygwin AT cygwin DOT com
|
X-Mailman-Version: | 2.1.29
|
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com>
|
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>,
|
| <mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
|
List-Archive: | <https://cygwin.com/pipermail/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help>
|
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>,
|
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
|
From: | Brian Inglis via Cygwin <cygwin AT cygwin DOT com>
|
Reply-To: | cygwin AT cygwin DOT com
|
Cc: | Brian Inglis <Brian DOT Inglis AT Shaw DOT ca>, Bruno Haible <bruno AT clisp DOT org>
|
Errors-To: | cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
|
Sender: | "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>
|
X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 371GU7LY020636
|
On 2023-07-31 15:12, Corinna Vinschen via Cygwin wrote:
> Hi Bruno,
>
> On Jul 31 20:43, Bruno Haible via Cygwin wrote:
>> Corinna Vinschen wrote:
>>> there are more of those expressions which are disabled on glibc and
>>> fail on Cygwin, for instance in test-c32iscntrl.c. Maybe it's actually
>>> the better idea to disable them on Cygwin, too, rather than to change
>>> a working system...
>>
>> Sure. There is no standard how to map the Unicode properties to POSIX
>> character classes. Other than the mentioned ISO C constraints for
>> 'digit' and 'xdigit' and a few POSIX constraints, you are free to
>> map them as you like. For glibc and gnulib, I mapped them in a way
>> that seemed to make most sense for applications. But different
>> people might come to different meanings of "make sense".
>
> Ok, so I just pushed a patchset to Cygwin git, which should make GB18030
> support actually work.
>
> Also, the C11 functions c16rtomb, c32rtomb, mbrtoc16, mbrtoc32 are now
> implemented in Cygwin and a uchar.h header exists now, too.
>
> Assuming all gnulib tests disabled for GLibc in
>
> test-c32isalpha.c
> test-c32iscntrl.c
> test-c32isprint.c
> test-c32isgraph.c
> test-c32ispunct.c
> test-c32islower.c
>
> will be disabled for Cygwin as well, all gb18030 and c32 tests in gnulib
> work as desired now.
https://www.iso.org/standard/86539.html [ISO/IEC/IEEE 9945 CD]
Draft POSIX 2023 SUS V5 Issue 8 D3 CB2.1 proposes the following POSIX
Subprofiling Option Group: POSIX_C_LANG_UCHAR: ISO C Unicode Utilities.
https://www.iso.org/standard/82075.html [ISO/IEC 9899 DIS]
Draft Standard C 2023 is being voted on as of 2023-07-14, and if no technical
issues arise requiring tweaks, will become the new standard, in which Unicode
utilities <uchar.h> has some additions which you may wish to add; from:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf#page=426
also:
https://en.cppreference.com/w/c/string/multibyte
https://en.cppreference.com/w/c/language/arithmetic_types
major additions (note November official standard publication date):
"7.30 Unicode utilities <uchar.h>
1 The header <uchar.h> declares one macro, a few types, and several functions
for manipulating Unicode characters.
2 The macro
__STDC_VERSION_UCHAR_H__
is an integer constant expression with a value equivalent to 202311L.
3 The types declared are mbstate_t (described in 7.31.1) and size_t (described
in 7.21);
char8_t
which is an unsigned integer type used for 8-bit characters and is the same type
as unsigned char;
...
7.30.1 Restartable multibyte/wide character conversion functions
...
2 When used in the functions in this subclause, the encoding of char8_t,
char16_t, and char32_t objects, and sequences of such objects, is UTF-8, UTF-16,
and UTF-32, respectively. Similarly, the encoding of char and wchar_t, and
sequences of such objects, is the execution and wide execution encodings
(6.2.9), respectively
7.30.1.1 The mbrtoc8 function
Synopsis
1 #include <uchar.h>
size_t mbrtoc8(char8_t * restrict pc8, const char * restrict s,
size_t n, mbstate_t * restrict ps);
Description
2 If s is a null pointer, the mbrtoc8 function is equivalent to the call:
mbrtoc8(NULL, "", 1, ps)
In this case, the values of the parameters pc8 and n are ignored.
3 If s is not a null pointer, the mbrtoc8 function function inspects at most n
bytes beginning with the byte pointed to by s to determine the number of bytes
needed to complete the next multibyte character (including any shift sequences).
If the function determines that the next multibyte character is complete and
valid, it determines the values of the corresponding characters and then, if pc8
is not a null pointer, stores the value of the first (or only) such character in
the object pointed to by pc8.
Subsequent calls will store successive characters without consuming any
additional input until all the characters have been stored. If the corresponding
character is the null character, the resulting state described is the initial
conversion state.
Returns
4 The mbrtoc8 function returns the first of the following that applies (given
the current conversion state):
0 if the next n or fewer bytes complete the multibyte character that corresponds
to the null character (which is the value stored).
between 1 and n inclusive if the next n or fewer bytes complete a valid
multibyte character (which is the value stored); the value returned is the
number of bytes that complete the multibyte character.
(size_t)(-3) if the next character resulting from a previous call has been
stored (no bytes from the input have been consumed by this call).
(size_t)(-2) if the next n bytes contribute to an incomplete (but potentially
valid) multibyte character, and all n bytes have been processed (no value is
stored).398)
(size_t)(-1) if an encoding error occurs, in which case the next n or fewer
bytes do not contribute to a complete and valid multibyte character (no value is
stored); the value of the macro EILSEQ is stored in errno, and the conversion
state is unspecified.
398)When n has at least the value of the MB_CUR_MAX macro, this case can only
occur if s points at a sequence of redundant
shift sequences (for implementations with state-dependent encodings).
7.30.1.2 The c8rtomb function
Synopsis
1 #include <uchar.h>
size_t c8rtomb(char * restrict s, char8_t c8, mbstate_t * restrict ps);
Description
2 If s is a null pointer, the c8rtomb function is equivalent to the call
c8rtomb(buf, u8’\0’, ps)
where buf is an internal buffer.
3 If s is not a null pointer, the c8rtomb function determines the number of
bytes needed to represent the multibyte character that corresponds to the
character given or completed by c8 (including any shift sequences), and stores
the multibyte character representation in the array whose first element is
pointed to by s, or stores nothing if c8 does not represent a complete
character. At most MB_CUR_MAX bytes are stored. If c8 is a null character, a
null byte is stored, preceded by any shift sequence needed to restore the
initial shift state; the resulting state described is the initial conversion state.
Returns
4 The c8rtomb function returns the number of bytes stored in the array object
(including any shift sequences). When c8 is not a valid character, an encoding
error occurs: the function stores the value of the macro EILSEQ in errno and
returns (size_t)(-1); the conversion state is unspecified.
..."
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
- Raw text -