X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Sun, 24 Jan 2010 10:37:50 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Please support CP932. (I have problem using subversion with SJIS) Message-ID: <20100124093750.GA2402@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20100123135020 DOT GW2402 AT calimero DOT vinschen DOT de> <20100123150703 DOT GY2402 AT calimero DOT vinschen DOT de> <416096c61001230751m308ac854x4f026b1f83b966d0 AT mail DOT gmail DOT com> <20100123164546 DOT GZ2402 AT calimero DOT vinschen DOT de> <416096c61001231431u7e67cd37r2e741d0cb48c732f AT mail DOT gmail DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <416096c61001231431u7e67cd37r2e741d0cb48c732f@mail.gmail.com> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Jan 23 22:31, Andy Koppe wrote: > Corinna Vinschen: > > I applied a patch which handles the characters 0x5c and 0cfe differently > > if the charset is set to "SJIS" > > Something's going seriously wrong with this, and I'd suspect it's to > do with turning backslashes into yen symbols. Right. It occured to me tonight that this will not work from a filesystem point-of-view. The people who decided to overload backslash and tilde in the ASCII range with different symbols in SJIS still need some serious knock on their heads. No wonder the Microsoft guys kept the binary values of characters intact, especially due to the backslash problem. > Not sure what could be done about it. Remove SJIS support in favour of CP932? In theory, we could be able to keep SJIS support in. The Cygwin-internal function converting multibyte strings to Unicode filenames would have to use CP932. Only on the application level the conversion would use SJIS. There's no system API which takes wchar_t strings, so all strings are exchanged between application and system using multibyte strings. Since the multibytes strings are the same, that should give a round-trip which still works for Win32 filenames: Input string: "\x5e\xfe" Application: mbstowcs ("\x5e\xfe") ==> L"\x00a5\x203e" wcstombs (L"\x00a5\x203e") ==> "x5e\xfe" Cygwin sys_mbstowcs ("\x5e\xfe") ==> L"\x005e\x007e" sys_wcstombs (L"\x005e\x007e") ==> "x5e\xfe" Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple