DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 49ODsaDp4100740 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=oCHszrQz X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D86DA3858CD1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1729778075; bh=ENQhJgx8HsAfrDwHC7ZPWo2L7Ejw7HV98MgHwSoAz+o=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=oCHszrQzh1DaRCHAMU051X5JRVW+zwSuJUBFw/bosrp1r1mwxF6p6xyVNKMGWgMhM bm2wa0JcU6j28wTZWZrgh+iPXakeLOliYXsvfY7O0KJAA7FQKKSYll+i58I7XSj17j 01cJ+a4UzEnSZsGRQwI1GsP6TgGHAJ2NkSOb87Qo= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8D0913858D21 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8D0913858D21 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729778053; cv=none; b=PQUYikPIGfJMzupShwkMyiA8ZJi0j2OIxR50wGWJGSe60XyMeOpxDBofELDzarwt5LVqtQUvjc3apTl9+uz4fc8HZcYMVztq8YbM6qdHSBC1Yt52ArBhlNVnPvQurWt/WFo1Sjr4gkKTU4PspmChHyY6A1H5KUy6RcX7JWxRqfo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729778053; c=relaxed/simple; bh=IOR/pJxGiT9g/2KC7nYEXNDzKyDcw08+xZT9dWhgU08=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=wLGfUXscd0JjURJY4l8m6RdN+3S7RndVrmbvSBi6NEuL5yCcvWSvtJesz+9FKEMkOp1aZxAPxpVlN+t84XJgemgWk0ha5S+7g0vE0sZZr+2XkmUt2EpQa6i3bnbEsIUKTnUNEtTO2wtYAEbpPsLlG+vGBi2XUeBZug3FByNptXk= ARC-Authentication-Results: i=1; server2.sourceware.org Message-ID: <4ab35cef-e534-49d3-885d-98767afb807d@SystematicSW.ab.ca> Date: Thu, 24 Oct 2024 07:54:07 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Is this correct behaviour for 'rev'? Content-Language: en-CA To: cygwin AT cygwin DOT com References: <6fdbf92d-51f2-47ae-a482-5edd89ed3a89 AT maxrnd DOT com> Organization: Systematic Software In-Reply-To: X-Rspamd-Queue-Id: 500FB2000D X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Rspamd-Server: rspamout01 X-Stat-Signature: jpkadfrpf8p8783gq67du6i7shejx5nq X-Session-Marker: 427269616E2E496E676C69734053797374656D6174696353572E61622E6361 X-Session-ID: U2FsdGVkX1/RX6gzSw8sAJJ+DIVTaNClK9EqbpR7IyA= X-HE-Tag: 1729778035-560225 X-HE-Meta: U2FsdGVkX18+i1NJCwF9P9uoaT2QOitz6qAQ/UjAnCTI+sOUSmn8VCjpjH1w7W0+H2eSlf+unpMELxbuBz+7gXZsmHqxrTUtdb8KJVRCYXQwbmQU3r8Mo9GE4lrSX/XHZYd36UCZ6pZoBTIT7v6NIUqFzpJf7gqnzZLcO8wplogbarjpWh1u0h9FNyHp/gsh1dugS0P6cYZ0za2KoK9vr/VIcn+qQyOnapXK+GSvaDyQJuubk0pvpZsdr3gJ2kndEGQ9dv6v9gaP2/PAKmY1bh8UZX6TvkVFVTq29t2Cg2Q0jdeZ0zXVPmLxs9AqHeYlW5lsXzcHqn6cXGZ4hx5z8ACC8Q0ybXQG X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Brian Inglis via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Brian Inglis Content-Type: text/plain; charset="utf-8"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 49ODsaDp4100740 On 2024-10-23 23:01, Mark Geisert via Cygwin wrote: > On 10/22/2024 10:33 PM, Mark Geisert via Cygwin wrote: >> On 10/22/2024 8:00 PM, Backwoods BC via Cygwin wrote: >>> It appears that 'rev' is choking on any character \x80 or higher, but >>> is OK with those \x1f or smaller. It doesn't give an error or ignore >>> it, it just stops. >>> >>> I don't have access to a Linux box so I can't see if this happens >>> there and nothing in the documentation suggests that this is the >>> correct functionality. >>> >>> Test case: >>> printf 'no non-ASCII characters\nhex 01 >\x01< here\nhex 80 >\x80< >>> here\nLine 4\n'|rev|rev >>> >>> This is for "rev from util-linux 2.33.1" >>> >>> I don't have the current version of 'rev' on my system due to not >>> having updated in a while. I accidentally screwed up my installation >>> and have been reluctant to wipe it and start over. >>> >>> So, is this the expected behaviour for the current version of 'rev' >>> under Cygwin and/or Linux? >> >> The current Cygwin util-linux 2.39.3-2 rev behaves in the same, broken way. >> It looks like line-ending char(s) are not being handled correctly.   Don't >> know yet if it's rev itself or fgetws() being used by rev that's busted.  I'll >> investigate further.  Thanks for the report! > > This is a locale issue.  In the default Cygwin locale, rev mishandles the \x80 > byte and instead of stopping with an error message it enters an infinite loop. > I'll probably report this upstream instead of working out a local fix. > > There is a work-around: change to the "C" locale just to run rev. >     LC_ALL=C rev zzz > where zzz is a file containing your four lines.  You can also run your original > testcase with "rev" replaced by "LC_ALL=C rev" in both places. I run with a UTF-8 locale and have not noticed any issues as I use UTF-8 files. The man page for rev(1) says it works on wide characters, and `cygcheck rev` shows it is built with gettext-devel libintl/libiconv. I could see an issue if the shell and file locales mismatch, or possibly if the file contains SMP aka non-BMP characters as UTF-16 surrogates. The correct approach should be to match the execution locale to the file locale, for example, `LC_ALL=...UTF-8 rev ...` which should produce the expected results. -- Take care. Thanks, Brian Inglis Calgary, Alberta, Canada La perfection est atteinte Perfection is achieved non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut -- Antoine de Saint-Exupéry -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple