Mail Archives: cygwin/2009/04/03/14:21:16
On Apr 4 02:32, neomjp wrote:
> I used this Corinna's tiny program
> (http://sourceware.org/ml/cygwin/2009-04/msg00053.html )
> to create a file with a name containing a CJK character and tested
> how setting LANG works.
>
> I changed 0x20ac to 0x4e00 (<CJK Ideograph, First>). This is one of the
> characters used in all three languages. It is 0xe4 0xb8 0x80 in
> hexadecimal UTF-8. So, without setting LANG, the file name should look
> like "qq\016\344\270\200".
> [...]
> But it failed for JIS/ISO-2022-JP and eucJP. (It was represented as
> ASCII SO(0x0e)/UTF-8 sequence).
>
> What is going wrong here? What makes the file name conversion from
> UTF-16 to these character sets to fail? Or, what am I doing wrong?
> [...]
> LANG=en_US.ISO-2022-JP
> 0000000 71 71 0e e4 b8 80 0a
> q q so d 8 nul nl
> 0000007
> This must be identical to:
> 0000000 71 71 1b 24 42 30 6c 1b 28 42 0a
> q q esc $ B 0 l esc ( B nl
> 0000013
Esc? Uh oh. That is really correct?
[...time passes reading http://en.wikipedia.org/wiki/ISO_2022...]
Oh well, this will not work right now. I haven't looked into this
before and I actually thought that JIS is a double byte charset.
The properties of this charset don't allow to use the handcrafted
doublebyte charset function I created for Cygwin.
> LANG=en_US.eucJP
> 0000000 71 71 0e e4 b8 80 0a
> q q so d 8 nul nl
> 0000007
> This must be identical to:
> 0000000 71 71 b0 ec 0a
> q q 0 l nl
> 0000005
Same here, since eucJP characters can apparently contain three bytes.
I will have to rework the doublebyte function, or I have to create
a special multibyte function for these charsets.
Thanks for the test. I will look into that in the next couple of days.
Stay tuned.
Corinna
P.S: I'm not fluent with the Japanese charsets and codepages used on
Windows. http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx
contains all supported codepages/charsets. If you look for
the codepages 50220-50222, you'll see they are all called ISO 2022
Japanese. In Cygwin I'm using 50220 for JIS. Is that correct?
Or should I rather use one of 50221 or 50222?
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -