delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/11/24/17:52:00

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: Eric Blake <ebb9 AT byu DOT net>
Subject: Re: Cygwin bash regexp matching doesn't treat "\b" properly
Date: Tue, 24 Nov 2009 22:51:24 +0000 (UTC)
Lines: 34
Message-ID: <loom.20091124T233903-146@post.gmane.org>
References: <26500158 DOT post AT talk DOT nabble DOT com> <26500814 DOT post AT talk DOT nabble DOT com> <4B0C4C2A DOT 3080502 AT gmail DOT com>
Mime-Version: 1.0
User-Agent: Loom/3.14 (http://gmane.org/)
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Dave Korn <dave.korn.cygwin <at> googlemail.com> writes:

> 
> $ [[ "foo" =~ [[:\<:]]foo[[:\>:]] ]]; echo $?
> 0
> 
>   (Note that I had to backslash-escape the < and > there.  In other contexts
> that might not be needed.)

But here's something weird with how bash manages quoting inside [[ ]].  If you 
add a subexpression, you no longer need to quote < or >:

$ [[ foo =~ ([[:<:]]foo[[:>:]]) ]]; echo $?
0

With further experimentation, it turns out that cygwin's regex(3) does not 
understand [[:<:][:>:]] as a character class that accepts either direction of 
word boundary (for shame).  So, modulo the difference in the number of 
subexpressions, the closest representation of \b becomes:

([[:<:]]|[[:>:]])

and an expression to match words that either end in a or begin in b would be:

$ [[ ' b ' =~ ([a ]([[:<:]]|[[:>:]])[b ]) ]]; echo $?
0
$ [[ ' ab '  =~ ([a ]([[:<:]]|[[:>:]])[b ]) ]]; echo $?
1

which looks so much shorter as ([a ]\b[b ])

-- 
Eric Blake



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019