delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/11/04/06:11:08

DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4A4BB8dI1363228
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=pfTB3PK1
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3A240385840B
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1730718666;
bh=wA+VIOrrY/j3P1UpckQRb3Ar01c6j1o0m0jAKy1rqWg=;
h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=pfTB3PK1pnKzqMh/e1KtpP+d0On2yW/TQQTvHsW2LmeC1P1PZgyWDPu054UTwre6W
E8nBAo0MhCHSma+IjDO9DxugfsOmkcvaNtpP6accb5w1+Gnhi5Q/jgwYto/0eA6SDX
426yfFGj5arwHMN/OvyoTdMwuDIiuRUKnMvLHEM0=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 00D9A3858D29
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 00D9A3858D29
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730718643; cv=none;
b=g4qDhzEIiNry31omGjbFKl0cjgxLsKbIkPu2l+rU0ffcy8+YCrTVts4EOtTjCMSAt5PhLS0Gtmm0gD9hdSo7hB3czZUa8Uyh/Vu0XK/BQ7oxT8k4UBEgYipjza7rwMw+mGzV51gj2q2QIghnbtYDQLXr7I+otv45zO+tzb0LaHk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1730718643; c=relaxed/simple;
bh=f7xWGMjocafMEnk1HUOTGG/L5dBEqah0TUL2JfSCF+E=;
h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To;
b=fvggsB2nPnvNs2S4O11hSWcZ7CngVQNYlS2IMCREk+8N+PBMfHSfFm4tE2bM7TcxbSRG4hcyR9dflj25kK5LnqMHSbbEmP0HF9H2iOul0ACEth5hjFiuKCCQSyZXKHNpT+3qwvfvajFMmSj0fF/idxxKSJJGTgy8m1AV709o/6A=
ARC-Authentication-Results: i=1; server2.sourceware.org
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20230601; t=1730718634; x=1731323434;
h=content-transfer-encoding:to:subject:message-id:date:from
:in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
:subject:date:message-id:reply-to;
bh=KP2ROqC3mHF+NVP6nq0jojatMY1kRtS8e4OGoG5k6Zg=;
b=VcaW6yH2q87oHsxxQSRy269d//Fbyg507K4wlkVfvZ9ztcjjeypArC0H1eEzcrUmEV
I4tazISRR2S1WGxhuXbqmb1rRBn/lq2C4QPKz1Z7Pc4wZwEe9/X767QLnvIfl/6FCUuW
sNtfvC3OAz2kiIkS9LChR7P2fpS+fRK2f16uBSIVLH1sgY1YiZrL2ZlCpWvefTQdXQqN
Gzyiy5cJXTtO8Hewc53ieCgLcUzCQrUToyttKaPf2anc/0o/8ph6l15br6qAHLMrZ0q/
ercSOSsxCyEiHGPZmzuEZDTDlMiDj5JH9P0gx5uynp+3H1LkUbjpmAG81kSXOtF+vJmP
v8vQ==
X-Gm-Message-State: AOJu0YxXUh7X/Xi+eTZW49rmCAHYVW62UHY4d4YWXnmfdef8KIy+RImQ
nI6zidqT/ZmZno8qmE426Z0pZ8J+p0ntL0f1epSxMc7P/WKxQOnTZ3cd1iCaUYec5/ut/OGdzB6
mvAvUfePCY7EXbTv73ynI0L+QlrfrUzS2
X-Google-Smtp-Source: AGHT+IExx/myNmgDKmvW8eWEegqf4QdDBH8cOlXT9qq6ppRJeLGSIIrgIiI8b5fmYmPap0Qjl5wlyV7/iKkQmoh1LiQ=
X-Received: by 2002:a05:6402:34ce:b0:5ce:de18:3fb7 with SMTP id
4fb4d7f45d1cf-5cede183ffbmr950686a12.21.1730718634285; Mon, 04 Nov 2024
03:10:34 -0800 (PST)
MIME-Version: 1.0
References: <CAKwdsS9FGm9nqtZ+vSQ+WEWzRf-zUFAS06eo=ASwNB6ST3gddw AT mail DOT gmail DOT com>
<6fdbf92d-51f2-47ae-a482-5edd89ed3a89 AT maxrnd DOT com>
<f58d4a6c-476d-4cc5-bad5-28c99ad75c2b AT maxrnd DOT com>
<7618ad16-fc5a-4c5c-bce2-25915c2f2cc8 AT maxrnd DOT com>
<CAKwdsS8McuC6Bw_va7DOzBr1wpOWNNU7hcrH8cjuaCuRF0mb4Q AT mail DOT gmail DOT com>
<4b8d7a6c-c070-4c90-a3ae-c4d87a5fbe6b AT towo DOT net>
In-Reply-To: <4b8d7a6c-c070-4c90-a3ae-c4d87a5fbe6b@towo.net>
Date: Mon, 4 Nov 2024 03:10:20 -0800
Message-ID: <CAKwdsS_OAhOnKWMs0Y5+tRs5ShmocbvpAo4UwbanvA7MiH7=Jw@mail.gmail.com>
Subject: Re: Is this correct behaviour for 'rev'?
To: cygwin AT cygwin DOT com
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Backwoods BC via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Backwoods BC <completely DOT and DOT totally DOT trash AT gmail DOT com>
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4A4BB8dI1363228

On Sun, Nov 3, 2024 at 11:42 PM Thomas Wolff via Cygwin
<cygwin AT cygwin DOT com> wrote:
> Am 04.11.2024 um 05:56 schrieb Backwoods BC via Cygwin:
> > On Sun, Nov 3, 2024 at 1:49 AM Mark Geisert via Cygwin
> > <cygwin AT cygwin DOT com> wrote:
> >> Continuing my monologue, with due consideration of comments posted, ...
> >>
> >> On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote:
> >>> Replying to myself, I continue...
> >>>
> >>> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote:
> >>>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote:
> >>>>> It appears that 'rev' is choking on any character \x80 or higher, but
> >>>>> is OK with those \x1f or smaller. It doesn't give an error or ignore
> >>>>> it, it just stops.
> >>>>>
> >>>>> I don't have access to a Linux box so I can't see if this happens
> >>>>> there and nothing in the documentation suggests that this is the
> >>>>> correct functionality.
> >>>>>
> >>>>> Test case:
> >>>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80<
> >>>>> here\nLine 4\n'|rev|rev
> >>>>>
> >>>>> This is for "rev from util-linux 2.33.1"
> >>>>>
> >>>>> I don't have the current version of 'rev' on my system due to not
> >>>>> having updated in a while. I accidentally screwed up my installation
> >>>>> and have been reluctant to wipe it and start over.
> >>>>>
> >>>>> So, is this the expected behaviour for the current version of 'rev'
> >>>>> under Cygwin and/or Linux?
> >>>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken
> >>>> way.  It looks like line-ending char(s) are not being handled
> >>>> correctly.   Don't know yet if it's rev itself or fgetws() being used
> >>>> by rev that's busted.  I'll investigate further.  Thanks for the report!
> >>> This is a locale issue.  In the default Cygwin locale, rev mishandles
> >>> the \x80 byte and instead of stopping with an error message it enters an
> >>> infinite loop.  I'll probably report this upstream instead of working
> >>> out a local fix.
> >> Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error
> >> message when the OP's testcase is tried.  I'm testing the full 2.40.2
> >> for Cygwin release before too long.
> >>
> >>> There is a work-around: change to the "C" locale just to run rev.
> >>>       LC_ALL=C rev zzz
> >>> where zzz is a file containing your four lines.  You can also run your
> >>> original testcase with "rev" replaced by "LC_ALL=C rev" in both places.
> >> Implicit in that suggestion is that the OP seemed to be uninterested in
> >> any form of multi-byte characters.. just straightforward operation on
> >> bytes, even if they have the high bit set.
> >>
> >> That said, I appreciate the follow-up comments that dealt with the
> >> general problem.
> >> Thanks all,
> >>
> >> ..mark
> > Sorry for dropping out of the thread. I lost interest in pursuing the
> > issue once I learned that 'rev' would balk at any character it didn't
> > like instead of just passing it through, and found a workaround for my
> > case. What I really wanted is something that would do a byte-by-byte
> > reversal working backwards from a LF character.
> >
> > My use for 'rev' is to allow sorting based on field position from the
> > *end* of the line. 'sort' won't do this itself, as far as I can tell.
> > My method follows:
> > printf -v mySep '\xff'
> > cat fileOfFullPathNames | rev | sed -r -e "s/\./$mySep/" | rev | sort
> > -t "$mySep" --key=2.1 | tr "$mySep" '.'
> >
> > This particular pipe is to sort fileOfFullPathNames by file extension.
> > As mentioned, this stops abruptly when it encounters my inserted field
> > separator of \xff. I found that it would do what I wanted if I used
> > \x1f as mySep instead.
> >
> > To be honest, in far too many years of using *nix as a user (not a
> > developer), doing this kind of thing is the only use I've ever had for
> > 'rev'. I probably used a different separator before (likely \x09)
> > which is why I haven't encountered an issue.
> >
> > What I appear to really need is "rev --binary" that just reverses
> > everything regardless of what it is until it finds a LF. I may get
> > motivated to write it for myself if I run into situations where I
> > can't work around the restrictions in 'rev'.
> As noted before in this thread, "rev --binary" is "LC_ALL=C rev".

When 'rev' gets fixed, I'll try that. Until then, I'll just work
around it as "LC_ALL=C rev" still dies when it encounters any byte
>=\x80.

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019