delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2010/09/18/16:09:24

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Sat, 18 Sep 2010 22:08:51 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: awk gsub problem
Message-ID: <20100918200851.GA5760@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <AANLkTikzGH8GUZ5ZUytSJShfYE=KMyphyue83Q8XMm4- AT mail DOT gmail DOT com> <20100916092458 DOT GB15121 AT calimero DOT vinschen DOT de> <AANLkTimwcbmxMtfZWbkztef+fxQfKtoM9CsFOd38E2a3 AT mail DOT gmail DOT com> <20100918092139 DOT GE14602 AT calimero DOT vinschen DOT de>
MIME-Version: 1.0
In-Reply-To: <20100918092139.GE14602@calimero.vinschen.de>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Sep 18 11:21, Corinna Vinschen wrote:
> On Sep 17 22:30, Lee wrote:
> > On 9/16/10, Corinna Vinschen wrote:
> > > On Sep 15 18:30, Lee wrote:
> > >> I don't know if this is just a problem with the cygwin version of awk,
> > >> me misunderstanding something or what, but it looks like gsub isn't
> > >> working correctly in awk:
> > >> $ sh /tmp/test.awk
> > >> s= ::0::  should = ::S0::
> > >>
> > >> $ cat /tmp/test.awk
> > >> awk '
> > >> BEGIN {
> > >>   s="Serial0"
> > >>   gsub("[a-z]","",s)
> > >>   printf("s= ::%s::  should = ::S0::\n", s)
> > >>   exit
> > >> } '
> > >>
> > >> I also tried it with IGNORECASE=0 and with "awk --traditional" - same
> > >> results.
> > > Works fine for me:
> > 
> > Comment out the 'set LANG=" and gsub works fine:
> > $ echo $LANG
> > C.UTF-8
> > 
> > $ sh /tmp/test.awk
> > s= ::S0::  should = ::S0::
> > 
> > $ export LANG=en_US.UTF-8
> > 
> > $ sh /tmp/test.awk
> > s= ::0::  should = ::S0::
> > 
> > So awk gsub works for me again - thank you!
> > 
> > Just out of curiosity, why would setting LANG to en_US break
> > case-sensitivity in gsub?
> 
> I don't know either.  I just asked the upstream maintainer.  At least it
> isn't a Cygwin problem, since it also behaves the same on Linux.

I got reply from the upstream maintainer.  Case-sensitivity in gsub is
not broken, rather it's really a language dependent difference.

If LANG is "en_US" or "en_US.utf8", then the regular expression "[a-z]"
does *not* correspond anymore to the ASCII codes.  Rather it corresponds
to something like "[aAbBcCdD...zZ]", independent of the actual character
encoding ISO-8859-1 or UTF-8.

What you really want is this:

  BEGIN {
    s="Serial0"
    gsub("[[:lower:]]","",s)
    printf("s= ::%s::  should = ::S0::\n", s)
    exit
  }

The "[[:lower:]]" expression always catches all valid lowercase letters,
independent of the langauge, territory, and charset used.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019