delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2010/01/24/04:40:09

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Sun, 24 Jan 2010 10:37:50 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Please support CP932. (I have problem using subversion with SJIS)
Message-ID: <20100124093750.GA2402@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <e22ab97b1001222149r3c217decmb0da069d7049c896 AT mail DOT gmail DOT com> <20100123135020 DOT GW2402 AT calimero DOT vinschen DOT de> <20100123150703 DOT GY2402 AT calimero DOT vinschen DOT de> <416096c61001230751m308ac854x4f026b1f83b966d0 AT mail DOT gmail DOT com> <20100123164546 DOT GZ2402 AT calimero DOT vinschen DOT de> <416096c61001231431u7e67cd37r2e741d0cb48c732f AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <416096c61001231431u7e67cd37r2e741d0cb48c732f@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Jan 23 22:31, Andy Koppe wrote:
> Corinna Vinschen:
> > I applied a patch which handles the characters 0x5c and 0cfe differently
> > if the charset is set to "SJIS"
> 
> Something's going seriously wrong with this, and I'd suspect it's to
> do with turning backslashes into yen symbols.

Right.  It occured to me tonight that this will not work from a
filesystem point-of-view.  The people who decided to overload backslash
and tilde in the ASCII range with different symbols in SJIS still need
some serious knock on their heads.  No wonder the Microsoft guys kept
the binary values of characters intact, especially due to the backslash
problem.

> Not sure what could be done about it. Remove SJIS support in favour of CP932?

In theory, we could be able to keep SJIS support in.  The
Cygwin-internal function converting multibyte strings to Unicode
filenames would have to use CP932.  Only on the application level the
conversion would use SJIS.

There's no system API which takes wchar_t strings, so all strings are
exchanged between application and system using multibyte strings.  Since
the multibytes strings are the same, that should give a round-trip which
still works for Win32 filenames:

Input string:  "\x5e\xfe"

Application:     mbstowcs ("\x5e\xfe")      ==> L"\x00a5\x203e"
                 wcstombs (L"\x00a5\x203e") ==> "x5e\xfe"

Cygwin       sys_mbstowcs ("\x5e\xfe")      ==> L"\x005e\x007e"
             sys_wcstombs (L"\x005e\x007e") ==> "x5e\xfe"


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019