delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/10/24/16:07:58

DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 49OK7vYW042035
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=DO85gxpB
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9290F3858CD9
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1729800475;
bh=a2Rx5TA8Bei+SamSJKdPCrRhg6nnCleiNFMYXcbSLG8=;
h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=DO85gxpBP8yUoj5pOpbU3GY8vKLld680fote/C83eZfWiQVS0BYT+s4VFpi6kKWsK
n1UIGimAIW4r+hKdmG9dRWECeBRdLFcayg7DGCCLMTsDU8W718T2/5kseaAs3r5MUl
kd3bWMrN7L+G78hAWB1q6SgILcLSoZLnbTVec1j4=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 719293858D21
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 719293858D21
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729800394; cv=none;
b=rM+BSkEnzH0YcBCuKyaXnzR54eXFykfZgq7xveKxGzsJ4SGV8YbPmc2si+QQbKZNUx8GgchtT++CvOt5p7qqidwK0XxQkjdMIvAHudh1aqyNmJ37oXvI5P6YRbaRmC/N92sexnSTDgrTC5xXogGareRt6gLLKtbm8ZWtmOPP0/E=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1729800394; c=relaxed/simple;
bh=Ek1Arbzi6Ju6OhjPpkdA9A3zWsTgJb09jORW3q8O1Do=;
h=Message-ID:Date:MIME-Version:Subject:To:From;
b=YF3RfGSTqvqW7BDON2PULsemRL+WBgycXLWBPWj4hCPDahb2lD1vFgcj0DiYxlW2Zc5mDruUAKs6J217sqXoLZ12SWVnEIxfSra04GGReT5G10pijpLrQW1FkJgaxEnsAPV/ksCNkvS1fLm5SuhEYOMEjz2iB7zddJZ3KfYDUgY=
ARC-Authentication-Results: i=1; server2.sourceware.org
Message-ID: <6cf2bb3d-81fe-4a0d-a4ce-9e28dd976beb@SystematicSW.ab.ca>
Date: Thu, 24 Oct 2024 14:06:00 -0600
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Is this correct behaviour for 'rev'?
To: cygwin AT cygwin DOT com
References: <CAKwdsS9FGm9nqtZ+vSQ+WEWzRf-zUFAS06eo=ASwNB6ST3gddw AT mail DOT gmail DOT com>
<6fdbf92d-51f2-47ae-a482-5edd89ed3a89 AT maxrnd DOT com>
<f58d4a6c-476d-4cc5-bad5-28c99ad75c2b AT maxrnd DOT com>
<f6930a61-eed4-4a06-a813-6e0ea1914a13 AT towo DOT net>
<429b4a7c-4a05-467c-a90d-6ed6e87cfc63 AT SystematicSW DOT ab DOT ca>
<439ae1d0-3b38-4553-b889-f0b344dfaf4a AT towo DOT net>
Autocrypt: addr=Brian DOT Inglis AT Shaw DOT ca; keydata=
xjMEXopx9BYJKwYBBAHaRw8BAQdAPq8FIaW+Bz7xnfyJ1gHQyf2EZo5sAwSPy/bRAcLeWl/N
I0JyaWFuIEluZ2xpcyA8QnJpYW4uSW5nbGlzQFNoYXcuY2E+wpYEExYIAD4WIQTG63sbl+cr
2nyOuZiKvQKcH1E27wUCXopx9AIbAwUJCWYBgAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAK
CRCKvQKcH1E276DmAP91Bt8kfJhKHYb9b2sao2fxwJFsl1GlRi516WKI0OkphQEA+ULITsPs
blfzSq+GgI7q4LPfRfTLy4Oo3gorlnhnfgnOOAReinH0EgorBgEEAZdVAQUBAQdAepgIsLwm
GQicfoIBaB9xHp63MQJqVCPbgPzESTg7EEwDAQgHwn0EGBYIACYWIQTG63sbl+cr2nyOuZiK
vQKcH1E27wUCXopx9AIbDAUJCWYBgAAKCRCKvQKcH1E27+zoAP4u2ivMQBAqaMeLOilqRWgy
nV2ATImz1p2v1H5P4kBiDwD3caPK1cxU5lijzuSDCjgtIpgF/avHbjA32fxJdIRwAA==
Organization: Systematic Software
In-Reply-To: <439ae1d0-3b38-4553-b889-f0b344dfaf4a@towo.net>
X-Rspamd-Queue-Id: F029118
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP,
UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6
X-Rspamd-Server: rspamout02
X-Stat-Signature: oyzj5fmdomq3ugmphdk15n5b46ngot5r
X-Session-Marker: 427269616E2E496E676C69734053797374656D6174696353572E61622E6361
X-Session-ID: U2FsdGVkX18iiPzo+DnzOcwh+dU2C9uxCWyHMoqLQ9M=
X-HE-Tag: 1729800342-634102
X-HE-Meta: U2FsdGVkX1+jxob87ZKU48dueULAt8eOuVdCiYjk7HsN0nycmF9QYI/VNNMmydegyTfz2buQPETpORm0QtNmfoVoFyArCtZEs8pC2eY+S/dHdpbjnder3etUT3Yjiw+mO4dZoF6K0OYFx4wh86VwhCcNccltw0pWgGBjFXtTSkJzcC5GO81LuWOOCTyUdikk44lZVE8KeZJeWnfAltFLS6zP6+r5DYB6HZgtbF/VAOJtAQ6dfFo4ZKy2z1N1GmyMBxIOYSjpR7msYs2gFhBUOwf5XM6izJxBItsFMFSB+bF5wn+EU395rGSZnv+BLxBiPKY4uOdSzKod1PLhUo0jvfOe4S1i0r9Y
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Brian Inglis via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Brian Inglis <Brian DOT Inglis AT SystematicSW DOT ab DOT ca>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 49OK7vYW042035

On 2024-10-24 11:22, Thomas Wolff via Cygwin wrote:
> 
> 
> 
> Am 24.10.2024 um 15:56 schrieb Brian Inglis via Cygwin:
>> On 2024-10-24 02:37, Thomas Wolff via Cygwin wrote:
>>>
>>> Am 24.10.2024 um 07:01 schrieb Mark Geisert via Cygwin:
>>>> Replying to myself, I continue...
>>>>
>>>> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
>>>>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
>>>>>> It appears that 'rev' is choking on any character \x80 or higher, but
>>>>>> is OK with those \x1f or smaller. It doesn't give an error or ignore
>>>>>> it, it just stops.
>>>>>>
>>>>>> I don't have access to a Linux box so I can't see if this happens
>>>>>> there and nothing in the documentation suggests that this is the
>>>>>> correct functionality.
>>>>>>
>>>>>> Test case:
>>>>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
>>>>>> here\nLine 4\n'|rev|rev
>>>>>>
>>>>>> This is for "rev from util-linux 2.33.1"
>>>>>>
>>>>>> I don't have the current version of 'rev' on my system due to not
>>>>>> having updated in a while. I accidentally screwed up my installation
>>>>>> and have been reluctant to wipe it and start over.
>>>>>>
>>>>>> So, is this the expected behaviour for the current version of 'rev'
>>>>>> under Cygwin and/or Linux?
>>>>>
>>>>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same,
>>>>> broken way.  It looks like line-ending char(s) are not being handled
>>>>> correctly.   Don't know yet if it's rev itself or fgetws() being used
>>>>> by rev that's busted.  I'll investigate further.  Thanks for the
>>>>> report!
>>>>
>>>> This is a locale issue.  In the default Cygwin locale, rev mishandles
>>>> the \x80 byte and instead of stopping with an error message it enters
>>>> an infinite loop.  I'll probably report this upstream instead of
>>>> working out a local fix.
>>>>
>>>> There is a work-around: change to the "C" locale just to run rev.
>>>>     LC_ALL=C rev zzz
>>>> where zzz is a file containing your four lines.  You can also run your
>>>> original testcase with "rev" replaced by "LC_ALL=C rev" in both places.
>>> Sorry, this is not a good workaround as it corrupts all (proper)
>>> non-ASCII characters.
>>> You could do e.g.
>>> grep . | rev
>>
>> Not quite, as that just matches non-empty lines, you would have to do
>> something more like `grep -o . ...`, but not sure that would do what
>> you want either.
>>
> Ah, right, so:
> egrep -e "(^$|.)" | rev
> or maybe there is some more suitable tool.
> 
>> The correct approach should be to match the execution locale to the
>> file locale, for example, `LC_ALL=...UTF-8 rev ...` which should
>> produce the expected results.
> That's not the point. You can never be sure that there is no stray
> wrong-encoded byte in your files, and rev should definitely not
> endless-loop in that case.

 >>>>>> it, it just stops.

I take that to mean it exits without processing past the invalid byte, and does 
not get stuck looping, and the function seems to be written that way, although 
my preference would be on error to copy the input *byte* to output, and proceed.

The latest tweak in the repo fgetwc_or_err() makes all utilities just die with 
an error message as soon as there is *any* encoding error.

As fget/putwc(3) depend on LC_CTYPE, it appears to be up to the user to set that 
locale category appropriately, and it appears then to be up to the user to pipe 
the output thru iconv or equivalent to convert to the terminal locale charset.

-- 
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                 -- Antoine de Saint-Exupéry


-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019