Mail Archives: cygwin/2010/05/20/12:13:39

delorie.com/archives/browse.cgi

search

Mail Archives: cygwin/2010/05/20/12:13:39

X-Recipient: archive-cygwin AT delorie DOT com

X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS

X-Spam-Check-By: sourceware.org

Message-ID: <4BF55F87.4060407@towo.net>

Date: Thu, 20 May 2010 18:12:55 +0200

From: Thomas Wolff <towo AT towo DOT net>

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4

MIME-Version: 1.0

To: cygwin AT cygwin DOT com

Subject: Re: sed doesn't like LANG= anymore

References: <20100520123926 DOT GA1432 AT onderneming10 DOT xs4all DOT nl> <AANLkTilpbuyiJIswTZGQN5jsHsK793ITUP9pcB95Hf1l AT mail DOT gmail DOT com>

In-Reply-To: <AANLkTilpbuyiJIswTZGQN5jsHsK793ITUP9pcB95Hf1l@mail.gmail.com>

X-IsSubscribed: yes

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm

List-Id: <cygwin.cygwin.com>

List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>

List-Archive: <http://sourceware.org/ml/cygwin/>

List-Post: <mailto:cygwin AT cygwin DOT com>

List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>

Sender: cygwin-owner AT cygwin DOT com

Mail-Followup-To: cygwin AT cygwin DOT com

Delivered-To: mailing list cygwin AT cygwin DOT com

Am 20.05.2010 18:05, schrieb Andy Koppe:
> On Thursday, May 20, 2010, Jurriaan wrote:
>    
>> A very long sed script that's been working for ages (back from the 1.5
>> age) here has stopped working.
>>
>> It turned out sed doesn't like some strings anymore when environment
>> variable LANG is empty. With LANG=ASCII, there are no problems.
>>
>> The actual text in the SED command is shown below as spaces, but it's a
>> Swedish a with a small o on top of it, like this:
>>
>> sed -e"s/@a/ a/g;"
>>
>> where a is character 0xe5.
>>
>> Running with LANG=ASCII works, with LANG empty I get 'unterminated `s'
>> command' from sed (which confused me for a while).
>>      
> With empty LANG you're using the default UTF-8 encoding, where that
> 0xe5 byte constitutes an incomplete character. You need to either run
> with a LANG setting that fits your script, e.g. C.ISO-8859-1, or
> convert your script to UTF-8. I'm puzzled as to why LANG=ASCII would
> have worked, since that's not a valid setting.
>    
With LANG=anything-unknown, the charmap is set to ASCII, so it works (as 
there is at least no multibyte character then).
Considering the described effect, I doubt that a UTF-8 decoder should 
swallow an ASCII byte after an incomplete UTF-8 sequence;
it should rather stop at the last UTF-8 sequence byte, and consider any 
subsequent initial UTF-8 or ASCII byte as a new character.
I guess the script would still work on Linux (can't try right now, 
sorry) even in a "wrong" locale, so I think something should be fixed in 
the newlib conversion functions here.
------
Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -

webmaster	delorie software privacy
Copyright © 2019 by DJ Delorie	Updated Jul 2019

X-Recipient:	archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status:	No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS
X-Spam-Check-By:	sourceware.org
Message-ID:	<4BF55F87.4060407@towo.net>
Date:	Thu, 20 May 2010 18:12:55 +0200
From:	Thomas Wolff <towo AT towo DOT net>
User-Agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.9) Gecko/20100317 Thunderbird/3.0.4
MIME-Version:	1.0
To:	cygwin AT cygwin DOT com
Subject:	Re: sed doesn't like LANG= anymore
References:	<20100520123926 DOT GA1432 AT onderneming10 DOT xs4all DOT nl> <AANLkTilpbuyiJIswTZGQN5jsHsK793ITUP9pcB95Hf1l AT mail DOT gmail DOT com>
In-Reply-To:	<AANLkTilpbuyiJIswTZGQN5jsHsK793ITUP9pcB95Hf1l@mail.gmail.com>
X-IsSubscribed:	yes
Mailing-List:	contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id:	<cygwin.cygwin.com>
List-Subscribe:	<mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive:	<http://sourceware.org/ml/cygwin/>
List-Post:	<mailto:cygwin AT cygwin DOT com>
List-Help:	<mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender:	cygwin-owner AT cygwin DOT com
Mail-Followup-To:	cygwin AT cygwin DOT com
Delivered-To:	mailing list cygwin AT cygwin DOT com