delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2010/01/23/00:49:34

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=0.2 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
Date: Sat, 23 Jan 2010 14:49:21 +0900
Message-ID: <e22ab97b1001222149r3c217decmb0da069d7049c896@mail.gmail.com>
Subject: Please support CP932. (I have problem using subversion with SJIS)
From: Nayuta Taga <ganaware AT gmail DOT com>
To: cygwin AT cygwin DOT com
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Hi all,

Please support CP932.  Because CP932 is not equal to SJIS, I have
problem using subversion when LANG=ja_JP.SJIS .  With the attached
patch and LANG=ja_JP.CP932, I can use subversion as expected.

The problem is as follows:

I have the following line in my ~/.subversion/config:
	global-ignores = *~
When LANG=ja_JP.UTF-8, subversion ignores a file 'foo~'.
But when LANG=ja_JP.SJIS, it doesn't.

I looked into subverson, then I found a workaround.
I added *[U+203E] to the line:
	global-ignores = *~ *[U+203E]
([U+203E] is one character) and saved it in UTF-8.  This works fine.

In short, '~' (U+007E TILDE) turns into U+203E (OVERLINE) when
LANG=ja_JP.SJIS.

Then I looked into cygwin and subversion again.
(1) cygwin1.dll converts L"foo~" (UCS-2) to "foo~" (CP932).
(2) Because subversion's internally uses UTF-8,
    "foo~" (CP932) should be converted to "foo~" (UTF-8).
(3) It uses iconv to convert from *SJIS* to UTF-8,
    because nl_langinfo(CODESET) returns "SJIS" when LANG=ja_JP.SJIS.
(4) The final string is "foo\xe2\x80\xbe".
    (e2 80 be is UTF-8 representation of U+203E)

With my patch I can use LANG=ja_JP.CP932, nl_langinfo(CODESET) returns
"CP932".  So the final string is "foo~".

supplement:

$ echo -n foo~ | iconv -f CP932 -t UTF-8 | od -t x1 -t a
0000000    66  6f  6f  7e
           f   o   o   ~
0000004
$ echo -n foo~ | iconv -f SJIS -t UTF-8 | od -t x1 -t a
0000000    66  6f  6f  e2  80  be
           f   o   o   ?  80   ?
0000006

-- 
TAGA Nayuta <ganaware AT gmail DOT com>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019