delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
X-SWARE-Spam-Status: | No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,T_TO_NO_BRKTS_FREEMAIL |
X-Spam-Check-By: | sourceware.org |
MIME-Version: | 1.0 |
In-Reply-To: | <AANLkTil9K6g8VzziQFm_HD_UcrKpKxxp8L6XEOtOJ0T3@mail.gmail.com> |
References: | <AANLkTinfzh_OsXWlI-xzEgl5QEn6zBR-_ikaXInnu-Ps AT mail DOT gmail DOT com> <4BF55DF8 DOT 2090007 AT towo DOT net> <AANLkTikH39ppClmi9z_TnZ3GJeIbs0ZuhxWm2yNiGbvs AT mail DOT gmail DOT com> <AANLkTini_UcjRIl2pofwHHkoW7tAWWtY2EoqOw4AEjxC AT mail DOT gmail DOT com> <AANLkTil9K6g8VzziQFm_HD_UcrKpKxxp8L6XEOtOJ0T3 AT mail DOT gmail DOT com> |
Date: | Sat, 29 May 2010 06:16:04 +0100 |
Message-ID: | <AANLkTin9EXynUminGr5mwjqqqMX4Kocds9FQc3k4ccSU@mail.gmail.com> |
Subject: | Re: LANG=ja_JP.Shift_JIS |
From: | Andy Koppe <andy DOT koppe AT gmail DOT com> |
To: | cygwin AT cygwin DOT com |
Cc: | rushojp <rushojp AT gmail DOT com> |
X-IsSubscribed: | yes |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Unsubscribe: | <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
On 22 May 2010 14:27, rushojp wrote: >> So why do you need to set it to ja_JP.Shift_JIS if ja_JP.CP932 and >> ja_JP.SJIS do the same thing? > > There is no serious reason. > I think IANA name is more famous. Fair enough, but I think it would be misleading to use the official IANA name for what's a (slightly) different charset. > @centos5.5 > $ echo -ne '\x5c ~ \x81\x60'|iconv -f Shift_JIS -t UTF-16LE|hexdump > 0000000 00a5 0020 203e 0020 301c > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f SJIS -t UTF-16LE|hexdump > 0000000 00a5 0020 203e 0020 301c > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f CP932 -t UTF-16LE|hexdump > 0000000 005c 0020 007e 0020 ff5e > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f Windows-31J -t UTF-16LE|hexdump > 0000000 005c 0020 007e 0020 ff5e > 000000a > > @cygwin-1.7 > $ echo -ne '\x5c ~ \x81\x60'|iconv -f Shift_JIS -t UTF-16LE|hexdump > 0000000 00a5 0020 203e 0020 301c > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f SJIS -t UTF-16LE|hexdump > 0000000 00a5 0020 203e 0020 301c > 000000a > $ echo -ne '\x5c ~ \x81\x60'|iconv -f CP932 -t UTF-16LE|hexdump > 0000000 005c 0020 007e 0020 301c > 000000a Looks as expected to me. Iconv's charset names are independent of the locale charset names, but it is unfortunate that "SJIS" means "Shift_JIS" to iconv whereas it means "CP932" to the locale system. That's why I called the SJIS->CP932 mapping "dodgy", but we need to keep it for compatibility (and convenience). Importantly, nl_langinfo(CODESET) returns "CP932" both for ja_JP.CP932 and ja_JP.SJIS, so that programs that use the CODESET string in iconv end up with the correct encoding. > $ echo -ne '\x5c ~ \x81\x60'|iconv -f Windows-31J -t UTF-16LE|hexdump > iconv: conversion from Windows-31J unsupported > iconv: try 'iconv -l' to get the list of supported encodings I had to look that one up: "Windows-31J" is the official IANA name for CP932. I guess it should be added to Cygwin's iconv. (But how did they come up with that name?) Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |