X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=0.2 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org MIME-Version: 1.0 Date: Sat, 23 Jan 2010 14:49:21 +0900 Message-ID: <e22ab97b1001222149r3c217decmb0da069d7049c896@mail.gmail.com> Subject: Please support CP932. (I have problem using subversion with SJIS) From: Nayuta Taga <ganaware AT gmail DOT com> To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=ISO-8859-1 Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: <cygwin.cygwin.com> List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com> List-Archive: <http://sourceware.org/ml/cygwin/> List-Post: <mailto:cygwin AT cygwin DOT com> List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Hi all, Please support CP932. Because CP932 is not equal to SJIS, I have problem using subversion when LANG=ja_JP.SJIS . With the attached patch and LANG=ja_JP.CP932, I can use subversion as expected. The problem is as follows: I have the following line in my ~/.subversion/config: global-ignores = *~ When LANG=ja_JP.UTF-8, subversion ignores a file 'foo~'. But when LANG=ja_JP.SJIS, it doesn't. I looked into subverson, then I found a workaround. I added *[U+203E] to the line: global-ignores = *~ *[U+203E] ([U+203E] is one character) and saved it in UTF-8. This works fine. In short, '~' (U+007E TILDE) turns into U+203E (OVERLINE) when LANG=ja_JP.SJIS. Then I looked into cygwin and subversion again. (1) cygwin1.dll converts L"foo~" (UCS-2) to "foo~" (CP932). (2) Because subversion's internally uses UTF-8, "foo~" (CP932) should be converted to "foo~" (UTF-8). (3) It uses iconv to convert from *SJIS* to UTF-8, because nl_langinfo(CODESET) returns "SJIS" when LANG=ja_JP.SJIS. (4) The final string is "foo\xe2\x80\xbe". (e2 80 be is UTF-8 representation of U+203E) With my patch I can use LANG=ja_JP.CP932, nl_langinfo(CODESET) returns "CP932". So the final string is "foo~". supplement: $ echo -n foo~ | iconv -f CP932 -t UTF-8 | od -t x1 -t a 0000000 66 6f 6f 7e f o o ~ 0000004 $ echo -n foo~ | iconv -f SJIS -t UTF-8 | od -t x1 -t a 0000000 66 6f 6f e2 80 be f o o ? 80 ? 0000006 -- TAGA Nayuta <ganaware AT gmail DOT com> -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple