X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Thu, 19 Jul 2012 11:20:24 +0200 From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com> To: cygwin AT cygwin DOT com Subject: Re: length in gawk returns wrong value Message-ID: <20120719092024.GA31055@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <loom DOT 20120719T103849-659 AT post DOT gmane DOT org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <loom.20120719T103849-659@post.gmane.org> User-Agent: Mutt/1.5.21 (2010-09-15) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: <cygwin.cygwin.com> List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com> List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com> List-Archive: <http://sourceware.org/ml/cygwin/> List-Post: <mailto:cygwin AT cygwin DOT com> List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Jul 19 08:50, Ralf wrote: > The following lines create a file named ttt.txt. The file ttt.txt contains > exactly what I want (oct 374 for the umlaut u). But if you look at the output of > these lines you can see that the function length() of gawk can not handle this > character: > > uname -a > echo "Rücken" > ttt.txt > od -c ttt.txt > gawk '{print "Length: " length($0)}' ttt.txt > > Output: > CYGWIN_NT-6.0-WOW64 WIESWEG 1.7.9(0.237/5/3) 2011-03-29 10:10 i686 Cygwin Uh oh. 1.7.9 is old. Please update. > 0000000 R 374 c k e n \r \n > 0000010 > Length: 1 > > What can I do to get the correct length in gawk without changing the contents of > ttt.txt? Dunno. This is not what I see. What did you have $LANG and $LC_CTYPE set to? Here's what I see: $ uname -a CYGWIN_NT-6.1 vmbert7 1.7.16(0.261/5/3) 2012-07-09 14:51 i686 Cygwin $ echo $LANG C.UTF-8 $ echo "Rücken" > ttt.txt $ od -c ttt.txt 0000000 R 303 274 c k e n \n 0000010 $ gawk '{print "Length: " length($0)}' ttt.txt Length: 6 $ gawk --version | head -1 GNU Awk 4.0.1 Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple