delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/09/24/04:20:27

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-0.5 required=5.0 tests=BAYES_00,RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: Ronald Fischer <ronaldf AT eml DOT cc>
Subject: Encoding of German 'umlauts' - please explain
Date: Thu, 24 Sep 2009 08:17:42 +0000 (UTC)
Lines: 23
Message-ID: <loom.20090924T100848-137@post.gmane.org>
Mime-Version: 1.0
User-Agent: Loom/3.14 (http://gmane.org/)
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Maybe someone could enlighten me about the following:

On Cygwin bash I see

$ echo ü | od -cx
0000000 374  \n
        0afc
0000002

That means, the German letter ü has encoding 0xFC. If I do the same on CMD shell
(the 'od' used here comes from the Gnu Utilities for Windows), I see:

  echo ü | od -cx
0000000 201      \r  \n
        2081 0a0d
0000004

That is, ü is encoded as 0x81. Why is this different?

I am aware that, for historic reason, different encodings exist (the old
DOS encoding, Windows ANSI encoding etc.). I wouldn't have expected those
differences, however, when comparing bash.exe vs. cmd.exe.



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019