delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2010/01/24/05:21:46

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <20100124093750.GA2402@calimero.vinschen.de>
References: <e22ab97b1001222149r3c217decmb0da069d7049c896 AT mail DOT gmail DOT com> <20100123135020 DOT GW2402 AT calimero DOT vinschen DOT de> <20100123150703 DOT GY2402 AT calimero DOT vinschen DOT de> <416096c61001230751m308ac854x4f026b1f83b966d0 AT mail DOT gmail DOT com> <20100123164546 DOT GZ2402 AT calimero DOT vinschen DOT de> <416096c61001231431u7e67cd37r2e741d0cb48c732f AT mail DOT gmail DOT com> <20100124093750 DOT GA2402 AT calimero DOT vinschen DOT de>
Date: Sun, 24 Jan 2010 10:17:39 +0000
Message-ID: <416096c61001240217l130c3e05ob5df918fd822be2d@mail.gmail.com>
Subject: Re: Please support CP932. (I have problem using subversion with SJIS)
From: Andy Koppe <andy DOT koppe AT gmail DOT com>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

2010/1/24 Corinna Vinschen:
>> Something's going seriously wrong with this, and I'd suspect it's to
>> do with turning backslashes into yen symbols.
>
> Right. =C2=A0It occured to me tonight that this will not work from a
> filesystem point-of-view. =C2=A0The people who decided to overload backsl=
ash
> and tilde in the ASCII range with different symbols in SJIS still need
> some serious knock on their heads. =C2=A0No wonder the Microsoft guys kept
> the binary values of characters intact, especially due to the backslash
> problem.

I looked into this a bit more, out of morbid curiosity.

Actually it's Microsoft themselves (or IBM?) who have to take a large
part of the blame here, for deciding to use the backslash as the DOS
directory separator. ISO-646, which is an internationalized version of
ASCII, defines the backslash codepoint as 'localizable', and many
national variants of it do define it as something else. (See
http://en.wikipedia.org/wiki/ISO/IEC_646)

To work around this issue in the case of SJIS, MS decided to stick
with the backslash for CP932, and instead implemented a nasty hack to
achieve some sort of SJIS compatibility: Japanese Windows fonts,
including Unicode fonts, have a Yen symbol at the backspace position.


> In theory, we could be able to keep SJIS support in. =C2=A0The
> Cygwin-internal function converting multibyte strings to Unicode
> filenames would have to use CP932. =C2=A0Only on the application level the
> conversion would use SJIS.

I've pondered that, and I don't think that's worthwhile. It's still
going to cause trouble, e.g. with the backslash's use as an escape
character and the tilde's use in shell expansions. Also, there are
some more differences between standard SJIS and CP932 (although none
as serious as the backslash and tilde issues), so more work would be
needed to get that right. Finally, CP932 is the only "SJIS" that
people are realistically going to care about, since that's what's in
widespread use due to Windows. If someone really needs standard SJIS
for converting documents or something, they can use iconv.

Therefore I've changed my mind on whether to keep SJIS and CP932
separate: I think we should stick with the <locale>.SJIS charset as it
is in 1.7.1, except that nl_langinfo(CODESET) for it should return
"CP932" instead of "SJIS", to make sure iconv uses the right charset,
thereby addressing the OP's issue.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019