delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/11/24/16:41:06

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <26503748.post@talk.nabble.com>
Date: Tue, 24 Nov 2009 13:40:54 -0800 (PST)
From: aputerguy <nabble AT kosowsky DOT org>
To: cygwin AT cygwin DOT com
Subject: Re: Cygwin bash regexp matching doesn't treat "\b" properly
In-Reply-To: <4B0C4C2A.3080502@gmail.com>
MIME-Version: 1.0
References: <26500158 DOT post AT talk DOT nabble DOT com> <26500814 DOT post AT talk DOT nabble DOT com> <4B0C4C2A DOT 3080502 AT gmail DOT com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Dave Korn writes:

> Bash man page for '~=' refers to man regex(3) which refers to man regex(7)
> which describes word boundary markers as below:
>
> $ [[ "foo" =~ [[:\<:]]foo[[:\>:]] ]]; echo $?
> 0
>
> $ [[ "foobar" =~ [[:\<:]]foo[[:\>:]] ]]; echo $?
> 1

Thanks David!
I had actually greppe'd both regex(3) and regex(7) before but I was looking
for the word "word" or "boundary" - neither of which are used in this
context.

HOWEVER, this solution while sweet for cygwin-bash, has the CONVERSE
PROBLEM.
Apparently, the special strings [[:<:]] and [[:>:]] are not recognized under
Linux regex(7) - they give return code 2.

So, now I have the frustrating situation where \\b works in Linux but not in
Cygwin while [[:<:]] works in Cygwin but not in Linux.

BTW, both regex(7) pages even imply they are POSIX.
Linux: "regex POSIX.2 regular expressions"
Cygwin: "regex - POSIX 1003.2 regular expressions"

Such incompatibility is a PITA because then in a mixed Windows/Linux
environment one has to remember to clutter scripts with ugly "if [ "$OSTYPE"
= "cygwin" ] exceptions, etc.



-- 
View this message in context: http://old.nabble.com/Cygwin-bash-regexp-matching-doesn%27t-treat-%22%5Cb%22-properly-tp26500158p26503748.html
Sent from the Cygwin list mailing list archive at Nabble.com.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019