X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org To: cygwin AT cygwin DOT com From: Eric Blake Subject: Re: Cygwin bash regexp matching doesn't treat "\b" properly Date: Tue, 24 Nov 2009 22:51:24 +0000 (UTC) Lines: 34 Message-ID: References: <26500158 DOT post AT talk DOT nabble DOT com> <26500814 DOT post AT talk DOT nabble DOT com> <4B0C4C2A DOT 3080502 AT gmail DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit User-Agent: Loom/3.14 (http://gmane.org/) X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Dave Korn googlemail.com> writes: > > $ [[ "foo" =~ [[:\<:]]foo[[:\>:]] ]]; echo $? > 0 > > (Note that I had to backslash-escape the < and > there. In other contexts > that might not be needed.) But here's something weird with how bash manages quoting inside [[ ]]. If you add a subexpression, you no longer need to quote < or >: $ [[ foo =~ ([[:<:]]foo[[:>:]]) ]]; echo $? 0 With further experimentation, it turns out that cygwin's regex(3) does not understand [[:<:][:>:]] as a character class that accepts either direction of word boundary (for shame). So, modulo the difference in the number of subexpressions, the closest representation of \b becomes: ([[:<:]]|[[:>:]]) and an expression to match words that either end in a or begin in b would be: $ [[ ' b ' =~ ([a ]([[:<:]]|[[:>:]])[b ]) ]]; echo $? 0 $ [[ ' ab ' =~ ([a ]([[:<:]]|[[:>:]])[b ]) ]]; echo $? 1 which looks so much shorter as ([a ]\b[b ]) -- Eric Blake -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple