X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Thu, 19 Jul 2012 11:20:24 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: length in gawk returns wrong value
Message-ID: <20120719092024.GA31055@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <loom DOT 20120719T103849-659 AT post DOT gmane DOT org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <loom.20120719T103849-659@post.gmane.org>
User-Agent: Mutt/1.5.21 (2010-09-15)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Jul 19 08:50, Ralf wrote:
> The following lines create a file named ttt.txt. The file ttt.txt contains
> exactly what I want (oct 374 for the umlaut u). But if you look at the output of
> these lines you can see that the function length() of gawk can not handle this
> character:
> 
> uname -a
> echo "Rücken" > ttt.txt
> od -c ttt.txt
> gawk '{print "Length: " length($0)}' ttt.txt
> 
> Output:
> CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin

Uh oh.  1.7.9 is old.  Please update.

> 0000000   R 374   c   k   e   n  \r  \n
> 0000010
> Length: 1
> 
> What can I do to get the correct length in gawk without changing the contents of
> ttt.txt?

Dunno.  This is not what I see.  What did you have $LANG and $LC_CTYPE
set to?  Here's what I see:

  $ uname -a
  CYGWIN_NT-6.1 vmbert7 1.7.16(0.261/5/3) 2012-07-09 14:51 i686 Cygwin

  $ echo $LANG
  C.UTF-8

  $ echo "Rücken" > ttt.txt
  $ od -c ttt.txt
  0000000   R 303 274   c   k   e   n  \n
  0000010

  $ gawk '{print "Length: " length($0)}' ttt.txt
  Length: 6

  $ gawk --version | head -1
  GNU Awk 4.0.1


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple