X-Recipient: archive-cygwin@delorie.com
X-Spam-Check-By: sourceware.org
Date: Sat, 23 Jan 2010 17:45:46 +0100
From: Corinna Vinschen <corinna-cygwin@cygwin.com>
To: cygwin@cygwin.com
Subject: Re: Please support CP932. (I have problem using subversion with  SJIS)
Message-ID: <20100123164546.GZ2402@calimero.vinschen.de>
Reply-To: cygwin@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
References: <e22ab97b1001222149r3c217decmb0da069d7049c896@mail.gmail.com>  <20100123135020.GW2402@calimero.vinschen.de>  <20100123150703.GY2402@calimero.vinschen.de>  <416096c61001230751m308ac854x4f026b1f83b966d0@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <416096c61001230751m308ac854x4f026b1f83b966d0@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie.com@cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com

On Jan 23 15:51, Andy Koppe wrote:
> On 23 January 2010 15:07, Corinna Vinschen:
> > Ouch.  I understand now.  Standard SJIS is *really* different from
> > Microsoft CP932 in two code points:
> >
> >  CP932 0x5c == U+005E
> >  SJIS  0x5c == U+00A5
> >
> >  CP932 0x7e == U+007E
> >  SJIS  0x7e == U+203E
> 
> Aargh! I wonder what that would do to DOS paths and stuff like ~username.
> 
> > Would it be a valid help for your case if Cygwin's SJIS conversion would
> > convert 0x5c to U+00A5 and 0x7e to 203E, so that the SJIS conversion
> > would be really correct *and* bijective?
> 
> I think that's the correct thing to do, but it'll likely break other
> stuff. Seems SJIS really isn't suited for Unix command line use. All
> the more reason to make EUC-JP the default for "ja_JP" I guess.
> 
> >  To me this sounds like the
> > better solution than adding a CP932 charset identifier.
> 
> I agree. Simply aliasing CP932 to SJIS is wrong, because they are
> quite different character sets. Supporting CP932 as a charset in its
> own right might be worth considering though, especially as that's the
> standard charset on Japanese Cygwin 1.5.

I applied a patch which handles the characters 0x5c and 0cfe differently
if the charset is set to "SJIS", and I applied Nayuta's patch to newlib's
loadlocale to allow "CP932" as charset.  So there will be a choice in
Cygwin 1.7.2.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

