DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4A4BB8dI1363228 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=pfTB3PK1 X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3A240385840B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1730718666; bh=wA+VIOrrY/j3P1UpckQRb3Ar01c6j1o0m0jAKy1rqWg=; h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=pfTB3PK1pnKzqMh/e1KtpP+d0On2yW/TQQTvHsW2LmeC1P1PZgyWDPu054UTwre6W E8nBAo0MhCHSma+IjDO9DxugfsOmkcvaNtpP6accb5w1+Gnhi5Q/jgwYto/0eA6SDX 426yfFGj5arwHMN/OvyoTdMwuDIiuRUKnMvLHEM0= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 00D9A3858D29 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 00D9A3858D29 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730718643; cv=none; b=g4qDhzEIiNry31omGjbFKl0cjgxLsKbIkPu2l+rU0ffcy8+YCrTVts4EOtTjCMSAt5PhLS0Gtmm0gD9hdSo7hB3czZUa8Uyh/Vu0XK/BQ7oxT8k4UBEgYipjza7rwMw+mGzV51gj2q2QIghnbtYDQLXr7I+otv45zO+tzb0LaHk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730718643; c=relaxed/simple; bh=f7xWGMjocafMEnk1HUOTGG/L5dBEqah0TUL2JfSCF+E=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=fvggsB2nPnvNs2S4O11hSWcZ7CngVQNYlS2IMCREk+8N+PBMfHSfFm4tE2bM7TcxbSRG4hcyR9dflj25kK5LnqMHSbbEmP0HF9H2iOul0ACEth5hjFiuKCCQSyZXKHNpT+3qwvfvajFMmSj0fF/idxxKSJJGTgy8m1AV709o/6A= ARC-Authentication-Results: i=1; server2.sourceware.org X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730718634; x=1731323434; h=content-transfer-encoding:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KP2ROqC3mHF+NVP6nq0jojatMY1kRtS8e4OGoG5k6Zg=; b=VcaW6yH2q87oHsxxQSRy269d//Fbyg507K4wlkVfvZ9ztcjjeypArC0H1eEzcrUmEV I4tazISRR2S1WGxhuXbqmb1rRBn/lq2C4QPKz1Z7Pc4wZwEe9/X767QLnvIfl/6FCUuW sNtfvC3OAz2kiIkS9LChR7P2fpS+fRK2f16uBSIVLH1sgY1YiZrL2ZlCpWvefTQdXQqN Gzyiy5cJXTtO8Hewc53ieCgLcUzCQrUToyttKaPf2anc/0o/8ph6l15br6qAHLMrZ0q/ ercSOSsxCyEiHGPZmzuEZDTDlMiDj5JH9P0gx5uynp+3H1LkUbjpmAG81kSXOtF+vJmP v8vQ== X-Gm-Message-State: AOJu0YxXUh7X/Xi+eTZW49rmCAHYVW62UHY4d4YWXnmfdef8KIy+RImQ nI6zidqT/ZmZno8qmE426Z0pZ8J+p0ntL0f1epSxMc7P/WKxQOnTZ3cd1iCaUYec5/ut/OGdzB6 mvAvUfePCY7EXbTv73ynI0L+QlrfrUzS2 X-Google-Smtp-Source: AGHT+IExx/myNmgDKmvW8eWEegqf4QdDBH8cOlXT9qq6ppRJeLGSIIrgIiI8b5fmYmPap0Qjl5wlyV7/iKkQmoh1LiQ= X-Received: by 2002:a05:6402:34ce:b0:5ce:de18:3fb7 with SMTP id 4fb4d7f45d1cf-5cede183ffbmr950686a12.21.1730718634285; Mon, 04 Nov 2024 03:10:34 -0800 (PST) MIME-Version: 1.0 References: <6fdbf92d-51f2-47ae-a482-5edd89ed3a89 AT maxrnd DOT com> <7618ad16-fc5a-4c5c-bce2-25915c2f2cc8 AT maxrnd DOT com> <4b8d7a6c-c070-4c90-a3ae-c4d87a5fbe6b AT towo DOT net> In-Reply-To: <4b8d7a6c-c070-4c90-a3ae-c4d87a5fbe6b@towo.net> Date: Mon, 4 Nov 2024 03:10:20 -0800 Message-ID: Subject: Re: Is this correct behaviour for 'rev'? To: cygwin AT cygwin DOT com X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 List-Id: General Cygwin discussions and problem reports List-Archive: List-Post: List-Help: List-Subscribe: , From: Backwoods BC via Cygwin Reply-To: Backwoods BC Content-Type: text/plain; charset="utf-8" Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4A4BB8dI1363228 On Sun, Nov 3, 2024 at 11:42 PM Thomas Wolff via Cygwin wrote: > Am 04.11.2024 um 05:56 schrieb Backwoods BC via Cygwin: > > On Sun, Nov 3, 2024 at 1:49 AM Mark Geisert via Cygwin > > wrote: > >> Continuing my monologue, with due consideration of comments posted, ... > >> > >> On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote: > >>> Replying to myself, I continue... > >>> > >>> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote: > >>>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote: > >>>>> It appears that 'rev' is choking on any character \x80 or higher, but > >>>>> is OK with those \x1f or smaller. It doesn't give an error or ignore > >>>>> it, it just stops. > >>>>> > >>>>> I don't have access to a Linux box so I can't see if this happens > >>>>> there and nothing in the documentation suggests that this is the > >>>>> correct functionality. > >>>>> > >>>>> Test case: > >>>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80< > >>>>> here\nLine 4\n'|rev|rev > >>>>> > >>>>> This is for "rev from util-linux 2.33.1" > >>>>> > >>>>> I don't have the current version of 'rev' on my system due to not > >>>>> having updated in a while. I accidentally screwed up my installation > >>>>> and have been reluctant to wipe it and start over. > >>>>> > >>>>> So, is this the expected behaviour for the current version of 'rev' > >>>>> under Cygwin and/or Linux? > >>>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken > >>>> way. It looks like line-ending char(s) are not being handled > >>>> correctly. Don't know yet if it's rev itself or fgetws() being used > >>>> by rev that's busted. I'll investigate further. Thanks for the report! > >>> This is a locale issue. In the default Cygwin locale, rev mishandles > >>> the \x80 byte and instead of stopping with an error message it enters an > >>> infinite loop. I'll probably report this upstream instead of working > >>> out a local fix. > >> Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error > >> message when the OP's testcase is tried. I'm testing the full 2.40.2 > >> for Cygwin release before too long. > >> > >>> There is a work-around: change to the "C" locale just to run rev. > >>> LC_ALL=C rev zzz > >>> where zzz is a file containing your four lines. You can also run your > >>> original testcase with "rev" replaced by "LC_ALL=C rev" in both places. > >> Implicit in that suggestion is that the OP seemed to be uninterested in > >> any form of multi-byte characters.. just straightforward operation on > >> bytes, even if they have the high bit set. > >> > >> That said, I appreciate the follow-up comments that dealt with the > >> general problem. > >> Thanks all, > >> > >> ..mark > > Sorry for dropping out of the thread. I lost interest in pursuing the > > issue once I learned that 'rev' would balk at any character it didn't > > like instead of just passing it through, and found a workaround for my > > case. What I really wanted is something that would do a byte-by-byte > > reversal working backwards from a LF character. > > > > My use for 'rev' is to allow sorting based on field position from the > > *end* of the line. 'sort' won't do this itself, as far as I can tell. > > My method follows: > > printf -v mySep '\xff' > > cat fileOfFullPathNames | rev | sed -r -e "s/\./$mySep/" | rev | sort > > -t "$mySep" --key=2.1 | tr "$mySep" '.' > > > > This particular pipe is to sort fileOfFullPathNames by file extension. > > As mentioned, this stops abruptly when it encounters my inserted field > > separator of \xff. I found that it would do what I wanted if I used > > \x1f as mySep instead. > > > > To be honest, in far too many years of using *nix as a user (not a > > developer), doing this kind of thing is the only use I've ever had for > > 'rev'. I probably used a different separator before (likely \x09) > > which is why I haven't encountered an issue. > > > > What I appear to really need is "rev --binary" that just reverses > > everything regardless of what it is until it finds a LF. I may get > > motivated to write it for myself if I run into situations where I > > can't work around the restrictions in 'rev'. > As noted before in this thread, "rev --binary" is "LC_ALL=C rev". When 'rev' gets fixed, I'll try that. Until then, I'll just work around it as "LC_ALL=C rev" still dies when it encounters any byte >=\x80. -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple