X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Sat, 23 Jan 2010 17:45:46 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Please support CP932. (I have problem using subversion with SJIS) Message-ID: <20100123164546.GZ2402@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20100123135020 DOT GW2402 AT calimero DOT vinschen DOT de> <20100123150703 DOT GY2402 AT calimero DOT vinschen DOT de> <416096c61001230751m308ac854x4f026b1f83b966d0 AT mail DOT gmail DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <416096c61001230751m308ac854x4f026b1f83b966d0@mail.gmail.com> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Jan 23 15:51, Andy Koppe wrote: > On 23 January 2010 15:07, Corinna Vinschen: > > Ouch.  I understand now.  Standard SJIS is *really* different from > > Microsoft CP932 in two code points: > > > >  CP932 0x5c == U+005E > >  SJIS  0x5c == U+00A5 > > > >  CP932 0x7e == U+007E > >  SJIS  0x7e == U+203E > > Aargh! I wonder what that would do to DOS paths and stuff like ~username. > > > Would it be a valid help for your case if Cygwin's SJIS conversion would > > convert 0x5c to U+00A5 and 0x7e to 203E, so that the SJIS conversion > > would be really correct *and* bijective? > > I think that's the correct thing to do, but it'll likely break other > stuff. Seems SJIS really isn't suited for Unix command line use. All > the more reason to make EUC-JP the default for "ja_JP" I guess. > > >  To me this sounds like the > > better solution than adding a CP932 charset identifier. > > I agree. Simply aliasing CP932 to SJIS is wrong, because they are > quite different character sets. Supporting CP932 as a charset in its > own right might be worth considering though, especially as that's the > standard charset on Japanese Cygwin 1.5. I applied a patch which handles the characters 0x5c and 0cfe differently if the charset is set to "SJIS", and I applied Nayuta's patch to newlib's loadlocale to allow "CP932" as charset. So there will be a choice in Cygwin 1.7.2. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple