X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <20090929092340.796@binki> References: <20090921103758 DOT GE20981 AT calimero DOT vinschen DOT de> <20090924073441 DOT GA30267 AT calimero DOT vinschen DOT de> <3f0ad08d0909240237s518de248jee409b731711404a AT mail DOT gmail DOT com> <20090924095701 DOT GC30851 AT calimero DOT vinschen DOT de> <20090924100006 DOT GD30851 AT calimero DOT vinschen DOT de> <20090926091504 DOT GA7275 AT calimero DOT vinschen DOT de> <3f0ad08d0909262021u5fe79873r65850865166ce40f AT mail DOT gmail DOT com> <3f0ad08d0909280903t5caaf611ie4049a73beb93f06 AT mail DOT gmail DOT com> <20090928161626 DOT GC8378 AT calimero DOT vinschen DOT de> <20090929092340 DOT 796 AT binki> Date: Tue, 29 Sep 2009 05:04:18 +0100 Message-ID: <416096c60909282104o639862d5h29c0c4046949c4aa@mail.gmail.com> Subject: Re: The C locale From: Andy Koppe To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com 2009/9/29 wynfield: > > Though I'm not an up on the details involved here, I will give > you feedback to the request for information about the locale issue, becau= se it affects the quick accessability and usage of Japanese language docume= nts. > > Either of the two follow values would be acceptable, but I feel that the = UTF-8 charset is becoming more and more adopted. > =C2=A0 =C2=A0 =C2=A0 =C2=A0LANG=3Dja -> UTF-8 > =C2=A0 =C2=A0 LANG=3Dja_JP -> UTF-8 > > Also the following be suitable if possible.. > =C2=A0 =C2=A0 =C2=A0 =C2=A0LANG=3Dja -> iso-2022-jp > =C2=A0 =C2=A0 LANG=3Dja_JP -> iso-2022-jp Thanks for the feedback! Now, Windows knows three different variants of iso-2022-jp. Do you know which one's the preferred one? CP50220: ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS) CP50221: ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana) CP50222: ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI) Also, Wikipedia has this to say: "Since ISO 2022 is a stateful encoding, a program can not jump in the middle of a block of text to search, insert or delete characters. This makes manipulation of the text very cumbersome and slow when compared to non-stateful encodings. Any jump in the middle of the text may require a back up to the previous escape sequence before the bytes following the escape sequence can be interpreted." Doesn't that make it very difficult to use with standard Unix tools? Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple