X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,SPF_PASS X-Spam-Check-By: sourceware.org Message-ID: <4ABB494F.9090409@gmx.de> Date: Thu, 24 Sep 2009 12:26:23 +0200 From: Matthias Andree User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Encoding of German 'umlauts' - please explain References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Ronald Fischer schrieb: > Maybe someone could enlighten me about the following: > > On Cygwin bash I see > > $ echo ü | od -cx > 0000000 374 \n > 0afc > 0000002 > > That means, the German letter ü has encoding 0xFC. If I do the same on CMD shell > (the 'od' used here comes from the Gnu Utilities for Windows), I see: > > echo ü | od -cx > 0000000 201 \r \n > 2081 0a0d > 0000004 > > That is, ü is encoded as 0x81. Why is this different? Because the code pages differ. 0xFC is ISO-8859-1 ("Latin 1") or -15 ("Latin 9") or CP1252/Windows-1252 (Latin 1 Extended; the latter allocates 0x80...0x9f differently than ISO-8859-1) and CMD uses CP437 or CP850. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple