delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/11/03/04:49:56

DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4A39nti1917342
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=es8GoMIM
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 429B93858D26
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1730627393;
bh=S1gT/5AFJX4+leHOLQAbrPx/oHTHFnLGZjJhtqK9i9Q=;
h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=es8GoMIMcXH7je3D1BNG6wvYnO6jaGmeTuy+EO07lKeNRx4uHLi3BmQLyBlEzJHMl
omSiaRNkQpNoXlldrasr9KYqatk9PKVN+WLrB++irZGzHaEbAkwVpistQvu4XeJo2Y
1ibfnoeU4LiBeQe8L+5JpRBPluNBjdtbnLuNc9FI=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 273343858D26
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 273343858D26
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730627337; cv=none;
b=RyWJD6zp7gorLLwYQHrHkHDuQaJXG3AfjxV4asAtGfAdL2zK+P7Ly692/PGiQCYc1+fJfR6pLosZkjey6Bwz9LQ9wZWdYnflNCqBDzKN+0JD002abTZY/Rxn9qwKkhm9vXljb213e1GpGsAxT1UVnmNzo8I1LDd/psObGZm+/jI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1730627337; c=relaxed/simple;
bh=f+qOhB1WTGmqG/Uw/Yjzoy5rFhUXBNGwCvsVCxFWQ2c=;
h=Message-ID:Date:MIME-Version:Subject:To:From;
b=m6sq9qfTEYEl3VkL4a24VTXgvZqz5ejUqabZ61yYdXrILJtUelLWMF0l7wjDKhGWFlJRiQXxbfpOCT8nhooH2uy7REwOxD42cSOhBxPCLWfYTZclUyr+/4cfb3wxIiI5Bh1jZEwTtDe2an+ZIpzCfnfxuDZSQhJhexGDjgd3x7k=
ARC-Authentication-Results: i=1; server2.sourceware.org
Message-ID: <7618ad16-fc5a-4c5c-bce2-25915c2f2cc8@maxrnd.com>
Date: Sun, 3 Nov 2024 01:48:46 -0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: Is this correct behaviour for 'rev'?
To: cygwin AT cygwin DOT com
References: <CAKwdsS9FGm9nqtZ+vSQ+WEWzRf-zUFAS06eo=ASwNB6ST3gddw AT mail DOT gmail DOT com>
<6fdbf92d-51f2-47ae-a482-5edd89ed3a89 AT maxrnd DOT com>
<f58d4a6c-476d-4cc5-bad5-28c99ad75c2b AT maxrnd DOT com>
In-Reply-To: <f58d4a6c-476d-4cc5-bad5-28c99ad75c2b@maxrnd.com>
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Mark Geisert via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Mark Geisert <mark AT maxrnd DOT com>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4A39nti1917342

Continuing my monologue, with due consideration of comments posted, ...

On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote:
> Replying to myself, I continue...
> 
> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
>>> It appears that 'rev' is choking on any character \x80 or higher, but
>>> is OK with those \x1f or smaller. It doesn't give an error or ignore
>>> it, it just stops.
>>>
>>> I don't have access to a Linux box so I can't see if this happens
>>> there and nothing in the documentation suggests that this is the
>>> correct functionality.
>>>
>>> Test case:
>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
>>> here\nLine 4\n'|rev|rev
>>>
>>> This is for "rev from util-linux 2.33.1"
>>>
>>> I don't have the current version of 'rev' on my system due to not
>>> having updated in a while. I accidentally screwed up my installation
>>> and have been reluctant to wipe it and start over.
>>>
>>> So, is this the expected behaviour for the current version of 'rev'
>>> under Cygwin and/or Linux?
>>
>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken 
>> way.  It looks like line-ending char(s) are not being handled 
>> correctly.   Don't know yet if it's rev itself or fgetws() being used 
>> by rev that's busted.  I'll investigate further.  Thanks for the report!
> 
> This is a locale issue.  In the default Cygwin locale, rev mishandles 
> the \x80 byte and instead of stopping with an error message it enters an 
> infinite loop.  I'll probably report this upstream instead of working 
> out a local fix.

Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error 
message when the OP's testcase is tried.  I'm testing the full 2.40.2 
for Cygwin release before too long.

> There is a work-around: change to the "C" locale just to run rev.
>      LC_ALL=C rev zzz
> where zzz is a file containing your four lines.  You can also run your 
> original testcase with "rev" replaced by "LC_ALL=C rev" in both places.

Implicit in that suggestion is that the OP seemed to be uninterested in 
any form of multi-byte characters.. just straightforward operation on 
bytes, even if they have the high bit set.

That said, I appreciate the follow-up comments that dealt with the 
general problem.
Thanks all,

..mark

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019