X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.3 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <20100520123926.GA1432@onderneming10.xs4all.nl> References: <20100520123926 DOT GA1432 AT onderneming10 DOT xs4all DOT nl> Date: Thu, 20 May 2010 19:05:17 +0300 Message-ID: Subject: Re: sed doesn't like LANG= anymore From: Andy Koppe To: "cygwin AT cygwin DOT com" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Thursday, May 20, 2010, Jurriaan wrote: > A very long sed script that's been working for ages (back from the 1.5 > age) here has stopped working. > > It turned out sed doesn't like some strings anymore when environment > variable LANG is empty. With LANG=ASCII, there are no problems. > > The actual text in the SED command is shown below as spaces, but it's a > Swedish a with a small o on top of it, like this: > > sed -e"s/@a/ a/g;" > > where a is character 0xe5. > > Running with LANG=ASCII works, with LANG empty I get 'unterminated `s' > command' from sed (which confused me for a while). With empty LANG you're using the default UTF-8 encoding, where that 0xe5 byte constitutes an incomplete character. You need to either run with a LANG setting that fits your script, e.g. C.ISO-8859-1, or convert your script to UTF-8. I'm puzzled as to why LANG=ASCII would have worked, since that's not a valid setting. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple