DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4A4CWxLL1386430 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=vAm5DZS2 X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 54A4D3857B9F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1730723577; bh=NUU7fb9WhQ+Fr9L4CjjuOdMR5skhHhxmzlzDei3fBTk=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=vAm5DZS2Mxz+qvhI0GpQTlr+O03FUs/JwyDRG8mPpZbEnVCjFI4ddlDbipqsRCc4y HhLTqqodweryurg+5l60E3wsfkJX5ij0R/rara5ZVcm68RdXVM/qS8qUOPPfVGXxpx bAhOd/brCHY/pphdyk9vJJW3x5WroJjuGCTXNVf0= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A26E3385842D ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A26E3385842D ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730723517; cv=none; b=Fmdi5fXGwUkXeblcbpKhTa5B4Bs0BftZZz/6njWBZXhZOnCga7Y38ufgEwVRHZFNew5x7dPB7V+b4IUw6w28aoZfIhG3jexYjEJpUaUQDpOswwsZBCikS/WzcpMAf/bGuWCKP4FzTkkg/1VvTSpEkzX+khCEmkImpEk9+49azaw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1730723517; c=relaxed/simple; bh=h17jH5HBWFlr4jBzyC/DYgVRSezmYMGe42/5T0fISXE=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=Qts47+Wpm8ghouuxRdsWB0+QD366jSuctns7AbKy6Vh+QfFGzu0COjADt2++TwpysD6sFROfyOrmDYmVvQqGDcqjR8KwmyqdrDI+7E+cJKdts0zUqT2uHwk1DYl4AaTHV/3xqLSso5H3Ozx6kM5GQnexpLZan69BwMfo+q59wnY= ARC-Authentication-Results: i=1; server2.sourceware.org X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Message-ID: <8edfd4a5-58b9-4439-add1-66830aa48f90@towo.net> Date: Mon, 4 Nov 2024 13:31:49 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Is this correct behaviour for 'rev'? To: cygwin AT cygwin DOT com References: <6fdbf92d-51f2-47ae-a482-5edd89ed3a89 AT maxrnd DOT com> <7618ad16-fc5a-4c5c-bce2-25915c2f2cc8 AT maxrnd DOT com> <4b8d7a6c-c070-4c90-a3ae-c4d87a5fbe6b AT towo DOT net> Autocrypt: addr=towo AT towo DOT net; keydata= xsDNBGNaf3QBDACVevqudcTSevLThXKQPU1QpaDxtGuYjtwmr7i9wXxVGih4Y4oxOJN4PYlu KBX9IVAI4651dA+xYtXuyIkWOPZWyyzkGKavQOn3Q7dk09oj7bh2IwOndpxXXde337D408EQ bQEGbMHr9lOWhSAideowzgCeFIvGTf2AovbPh97HpexJn1/HCRiRAhTNlrkS1DByUgCAeEMK fEr6aGM/Ou29MT+eTnQwOIZTnl9Z9LxM2FtqqMH3MycC7I2OoW3XXhuL8BPQdyJUjWa0/J11 Oo5jFkRXtWenIns6jGn18oW72jnDmo9jXwwS+iZWAV6Y51nhD7jSC+3xs9ORmPCdtHUSpTr1 zh67UueUJ3DUUNVuA25Hn/9EJMJ2L60BGUEr88NEB6pcZhmcwdkurAQeYT6t+frzBz2ctsoN BoxP/Xc02yd+z7hXWRRMrJWh9WHlQHA3Z4FfmyNhyPhs3MgKTJ1E9QfzGquigAmF3/k/Dc1m 7cSOKhGYhpEJdSpdXccJFKkAEQEAAc0cVGhvbWFzIFdvbGZmIDx0b3dvQHRvd28ubmV0PsLB BwQTAQgAMRYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn93AhsDBAsJCAcFFQgJCgsFFgID AQAACgkQxvPR7vYGnQKSMAv8Di+8MXB2mcfsemRdShfLLKcLOv+d0CXAtPVaY3XKxbKpRvC9 +AAT5wIHYjQft77/b2y87vGIh+nQ5hKLtNtQPSDtqG/Igkb5jAXpLi28fSUzgM96DvARmwve 5wSnAU3prxH+Y63YpOpslEcGMRoEtYCDy1ANMYPcEZT/YvDd4CplyyEai4VYrw3/LsESDYlY GK6uMQzZ1jl2cNOUFu6BwLUeZIcwaqGto8n4R4nbf4jxUEpa21bWBPqE+Jf49uipjPr/iJ72 5HbdWuuCfyTTJEJjfNEBigWP2RXM9iNDcO61V3aEjh76tThfBK2MMlLWfZkQaQziu24x8R4B I0efJYWBX2Sv2qnsH/EWj7FUIZjRqGG7LnWHLShfG6yjSOTOWYi8BbsvoftpaLWgZX28aGX4 uzuSZ5L0caXh/pr/gSgqoH/YbuFIgqtQH4seOBgTybd22Vpe78rnc+8450pN8qwchHAZaJka UxS0SpYxXzXmHUKILA4C43s0U/z2Mez9zsDNBGNaf3cBDADeJ7paMrb6f1+k8wM7tyk0/Ded KX/pOejt/D20Ceerw2iL/4tUmBL+A3ic2yjiSFUSsEfHwgCVwKrn4MwZtkesdiphm2lk6xWc k1ENCQy44QwQT6UZ/mHWYWcj5LS6ua183x1zdn9iF3lv150nm/ssw56D7USz/ap1Vh0lf5te D+CIheGLocVDqxWiu7rHP8jKRWFgq/+OU6HKX8p2Yv1oYsykh9qF2bFzawLDS+S1VbfRicfD G0RtceL/BAf7b6UE5u9TGdfrFEa2TKZeS/FS/ViKUfwsXQIki1sWt2FQENbuDY28vxyR46ZZ 0gixDCFUoBw5pkmOGVQa+1RQYrRqlN4X0CAgp7mFVeEHl5NTgiL1bemkQVmHOUDG+CzNg+Lk UGoedAtT672l3JjrnSs4j8zNshpgV2OfAhAC+V9XvqCjMnxzVfXkVlbuWpPfUWQeFclLGg8P agpQUE0Ux+VV4DoeQCxYEnRCf/n7n+IRfILj5+2l6Zw4M7zSu6ii0tUAEQEAAcLA9gQYAQgA IBYhBHUiRKsHn5d8BpWdP8bz0e72Bp0CBQJjWn97AhsMAAoJEMbz0e72Bp0CQr4L/REdT0SF mbapnZIe92THCdtAUgwEv8VdNiNFBJelz8P/fuXuNPtisYvQQD4e64zpWe2UC4Cxo9DUk/pW 6Qci1xaXRKEiSPjHdSGGVB1PFIcqiS75GCf/ga/Dnfsy0Y4Uh6OGTQnkvZLBCe3vvcVLDQ7F PuV79zA9/eOeOW6aGoO6bq/wH+z96f9LyTITkQDy07fm6JYTGuzAoJE2AEboU1mgbtlx+tAa QFkpAQkp2g1Vhc3A7k4vntlHOrjMC+uVFh7QTGFfIlLRF6izUjSe6EZ06LErzlIiE05RP3yF FSRWidW0wze26peYlxYVgH1+T9wMTW2oiTBybfAMHBAxUP7Gr1WUo/oJEr0srWhatz8AwydP y7NwFbdpYn0NcFBaIlLW/JL11Eovwlivow+oGpzGFuuzSuflp2q9s2JWtn4EhW0kEs93D0LP iuJWvRaCZ6aD3uF3FMW8wyVWZYsLrzune2jH8w/uKMprDEOGOm+BcyhEFedTyY1ygbZKl+0G kQ== In-Reply-To: X-Provags-ID: V03:K1:n8MYjCD4wFub4FIzr1vHkBZEPg7bE04+jrasNxQD1nF2rtEItbV FygsCMdd+ycMQUF1B21tNorIi6UtK3yGYoq8QEwF8dHdap+nIo66FhgYWazqU9IrUd34452 BES35MwvjP/Qmwjw3LW5X7/cshxRYPLp7TN6qol+lyQpH737M8nfAHn0ExFuU+75AIN0ptD 2Cc+Nzet7sK8AMz1QYzSg== UI-OutboundReport: notjunk:1;M01:P0:lN6uWIew18c=;hrX8C7s849deVxlOedXnQDNY9yl o2FuQyovk/IKpCFxSEjta/VVkF/i33fP/1b/bELPQ4YpOryMZdaYfNerovRm7h7c2m3VHBDV1 uUYLtAq3kyLtkpc0164cPDrMH1PYH8sTfofb1D4Ft1U7MlMnaw01eh1+v2uQzDiCIT0zEvJzP Ph0LvS85XM0+g9/pIx77Cy6oIwwRrDteLMeGXVhuJstdzZXnkIHSccIM0W4Fxx/WsYi9zEIDl i2Z835onxeijh/Yo8JAQdtA9bxXP+UI/K86fyDz8B0a4eg993JPsSKmAHEFJkqS0+z26ve15D jy4onhIox3pHlEeVFC/7ickCXGJIHKsEt0zOib+RPoZtpz+Dg557pIG4aCQZ1mM28ICELxenM oLA11RWdtSmRZWRpLa/SJMO2rHu127BoMaynU7fppaQbwEgg07t1LS4fe1Or2fIPBgfm+abLT q1mpYd6U9Ek88G9r5GJ8Gj/OgnCsDr3C0z4Df916/bJGhiyqxDxN2uL09grnu2dsWIBCJLycN 9c0bX6izIhDIP3uIbAHW7qbVawNwPqAgAKpPy3hyRlToi5GuHHjTeMkb3CBzIcgA4cU+yOumI wSalJOdM3wXZknJunUSGtFjtNP19EiYxQvW/OJUq78ccnUNCUIC/oqwE7WIPWQARSKbCuuJwD rdpfB/NznvwCk95bC5mhOniZR0MA10OAQke/Q8v7vjrzeogaB37KLjtLDQwEsSCFK9SkYJ39j s7eVcg0UZoWTYVsyfYQg1t5gKSTTbmwtw== X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Thomas Wolff via Cygwin Reply-To: Thomas Wolff Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4A4CWxLL1386430 Am 04.11.2024 um 12:10 schrieb Backwoods BC via Cygwin: > On Sun, Nov 3, 2024 at 11:42 PM Thomas Wolff via Cygwin > wrote: >> Am 04.11.2024 um 05:56 schrieb Backwoods BC via Cygwin: >>> On Sun, Nov 3, 2024 at 1:49 AM Mark Geisert via Cygwin >>> wrote: >>>> Continuing my monologue, with due consideration of comments posted, ... >>>> >>>> On 10/23/2024 10:01 PM, Mark Geisert via Cygwin wrote: >>>>> Replying to myself, I continue... >>>>> >>>>> On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote: >>>>>> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote: >>>>>>> It appears that 'rev' is choking on any character \x80 or higher, but >>>>>>> is OK with those \x1f or smaller. It doesn't give an error or ignore >>>>>>> it, it just stops. >>>>>>> >>>>>>> I don't have access to a Linux box so I can't see if this happens >>>>>>> there and nothing in the documentation suggests that this is the >>>>>>> correct functionality. >>>>>>> >>>>>>> Test case: >>>>>>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80< >>>>>>> here\nLine 4\n'|rev|rev >>>>>>> >>>>>>> This is for "rev from util-linux 2.33.1" >>>>>>> >>>>>>> I don't have the current version of 'rev' on my system due to not >>>>>>> having updated in a while. I accidentally screwed up my installation >>>>>>> and have been reluctant to wipe it and start over. >>>>>>> >>>>>>> So, is this the expected behaviour for the current version of 'rev' >>>>>>> under Cygwin and/or Linux? >>>>>> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken >>>>>> way. It looks like line-ending char(s) are not being handled >>>>>> correctly. Don't know yet if it's rev itself or fgetws() being used >>>>>> by rev that's busted. I'll investigate further. Thanks for the report! >>>>> This is a locale issue. In the default Cygwin locale, rev mishandles >>>>> the \x80 byte and instead of stopping with an error message it enters an >>>>> infinite loop. I'll probably report this upstream instead of working >>>>> out a local fix. >>>> Upstream util-linux 2.40.2 has an updated 'rev' that stops with an error >>>> message when the OP's testcase is tried. I'm testing the full 2.40.2 >>>> for Cygwin release before too long. >>>> >>>>> There is a work-around: change to the "C" locale just to run rev. >>>>> LC_ALL=C rev zzz >>>>> where zzz is a file containing your four lines. You can also run your >>>>> original testcase with "rev" replaced by "LC_ALL=C rev" in both places. >>>> Implicit in that suggestion is that the OP seemed to be uninterested in >>>> any form of multi-byte characters.. just straightforward operation on >>>> bytes, even if they have the high bit set. >>>> >>>> That said, I appreciate the follow-up comments that dealt with the >>>> general problem. >>>> Thanks all, >>>> >>>> ..mark >>> Sorry for dropping out of the thread. I lost interest in pursuing the >>> issue once I learned that 'rev' would balk at any character it didn't >>> like instead of just passing it through, and found a workaround for my >>> case. What I really wanted is something that would do a byte-by-byte >>> reversal working backwards from a LF character. >>> >>> My use for 'rev' is to allow sorting based on field position from the >>> *end* of the line. 'sort' won't do this itself, as far as I can tell. >>> My method follows: >>> printf -v mySep '\xff' >>> cat fileOfFullPathNames | rev | sed -r -e "s/\./$mySep/" | rev | sort >>> -t "$mySep" --key=2.1 | tr "$mySep" '.' >>> >>> This particular pipe is to sort fileOfFullPathNames by file extension. >>> As mentioned, this stops abruptly when it encounters my inserted field >>> separator of \xff. I found that it would do what I wanted if I used >>> \x1f as mySep instead. >>> >>> To be honest, in far too many years of using *nix as a user (not a >>> developer), doing this kind of thing is the only use I've ever had for >>> 'rev'. I probably used a different separator before (likely \x09) >>> which is why I haven't encountered an issue. >>> >>> What I appear to really need is "rev --binary" that just reverses >>> everything regardless of what it is until it finds a LF. I may get >>> motivated to write it for myself if I run into situations where I >>> can't work around the restrictions in 'rev'. >> As noted before in this thread, "rev --binary" is "LC_ALL=C rev". > When 'rev' gets fixed, I'll try that. Until then, I'll just work > around it as "LC_ALL=C rev" still dies when it encounters any byte >> =\x80. Well, it doesn't for me: > printf a'\x80'b | LC_ALL=C rev | od -t x1 0000000 62 80 61 -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple