delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp-workers/2001/09/14/05:39:39

X-Authentication-Warning: acp3bf.physik.rwth-aachen.de: broeker owned process doing -bs
Date: Fri, 14 Sep 2001 11:37:05 +0200 (MET DST)
From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
X-Sender: broeker AT acp3bf
To: Eli Zaretskii <eliz AT is DOT elta DOT co DOT il>
cc: djgpp-workers AT delorie DOT com
Subject: Re: NLS and djgpp.env
In-Reply-To: <8011-Fri14Sep2001101346+0300-eliz@is.elta.co.il>
Message-ID: <Pine.LNX.4.10.10109141122070.11832-100000@acp3bf>
MIME-Version: 1.0
Reply-To: djgpp-workers AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: djgpp-workers AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

On Fri, 14 Sep 2001, Eli Zaretskii wrote:

> > Date: Thu, 13 Sep 2001 18:39:35 +0200 (MET DST)
> > From: Hans-Bernhard Broeker <broeker AT physik DOT rwth-aachen DOT de>
> > 
> > If at all, it would be a good idea to set LC_COLLATE to "C",
> > lest users be badly surprised by 'grep' and other regex tools if they "set
> > LANG=de" and then use pattern like [a-z].
> 
> If we decide to set LC_COLLATE, I would suggest doing so in a special
> section for Grep programs, not a general setting.

Fine with me, too. 

The problem might be to nail down all that need it. In a nutshell, every
program using regular expressions is supposed to behave in that new, and
IMHO seriously braindamaged way. At the minimum, that would mean 'sed',
'awk', 'grep', possibly 'lex', and the POSIX standard regex library for C.
Every one of them would need an entry in DJGPP.ENV added.

BTW, for those of you who don't know what we're talking about: the latest
versions of 'grep' and similar tools, following some upcoming new POSIX
revision, have changed meaning of character classes: they now match all
letters in collation order, rather than code order positions between the
given endpoints. I.e. if your collation order sorts without respect for
letter case (many, including LANG=us, do that),

	echo "BOOM!" | grep "[a-e]"

will suddenly echo "BOOM!", which it never did before, because [a-e] will
now be equivalent to [aBbCcDdE] (it may match 'A', too, I'm not quite
sure) instead of the traditionally expected [abcde].  This is guaranteed
to break a big fraction of the existing base of shell scripts using grep,
sed or awk. Some of which are older than most of the DJGPP team, and have
been working fine ever since, but now, all of a sudden, they'll produce
nonsense.

Even German-based Linux distributor S.u.S.E. decided that this was so
inconvenient that they set LC_COLLATE=C in all login scripts they create,
even if you configured the account for a German user (-->
LANG=de_DE AT iso8859_15)

-- 
Hans-Bernhard Broeker (broeker AT physik DOT rwth-aachen DOT de)
Even if all the snow were burnt, ashes would remain.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019