delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2004/05/18/18:05:47

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-Authentication-Warning: slinky.cs.nyu.edu: pechtcha owned process doing -bs
Date: Tue, 18 May 2004 18:05:19 -0400 (EDT)
From: Igor Pechtchanski <pechtcha AT cs DOT nyu DOT edu>
Reply-To: cygwin AT cygwin DOT com
To: Joe Wigglesworth <wiggles AT ca DOT ibm DOT com>
cc: cygwin AT cygwin DOT com
Subject: Re: Anyone using bash shell in Japanese, Chinese, or Korean?
In-Reply-To: <OF775B6C5D.6321842C-ON85256E98.00770776-85256E98.00776C06@ca.ibm.com>
Message-ID: <Pine.GSO.4.58.0405181753480.2073@slinky.cs.nyu.edu>
References: <OF775B6C5D DOT 6321842C-ON85256E98 DOT 00770776-85256E98 DOT 00776C06 AT ca DOT ibm DOT com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.39

On Tue, 18 May 2004, Joe Wigglesworth wrote:

> I'm having difficulty getting the bash shell to handle Japanese double
> byte characters correctly. The handling of double byte Japanese characters
> is improved by adding the definitions listed below, but some commands such
> as ls, find, and cygpath still have problems. Is there anything else I can
> do to improve the handling of Japanese double byte characters in the bash
> shell?  I believe the same problems would occur with Chinese and Korean
> (or any other double byte language for that matter), but would be happy to
> be corrected by someone who knows otherwise.
> [snip]
>
> This problem does not occur with all Japanese characters. Problematic
> Japanese characters are Kanji characters which has 0x5c code as the second
> byte in Shift-JIS.

FWIW, I think it might be more than a coincidence that 0x5c is the ASCII
code for '\'.  I suspect the same problem will occur for characters with
0x2f ('/') in the second byte (if there are any).

The crux is that a lot of Cygwin applications don't have any handling of
multibyte characters -- they simply process each string as a sequence of
bytes.  The problem appears when the multi-byte representation contains
(accidentally) a character that's being treated specially (e.g., '/' or
'\').  How much of this is due to the program looking at the string
itself, and how much is due to using the wrong type of Windows API calls
(that aren't multibyte-friendly), remains to be seen.  It would be
interesting to strace the "ls ." invocation to see whether it breaks
somewhere inside "ls" or inside a Windows API call.
	Igor
P.S. We all saw the (identical?) post from two days ago
(<http://cygwin.com/ml/cygwin/2004-05/msg00567.html>), so there was no
need to re-post this.
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha AT cs DOT nyu DOT edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor AT watson DOT ibm DOT com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

"I have since come to realize that being between your mentor and his route
to the bathroom is a major career booster."  -- Patrick Naughton

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019