Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Date: Tue, 15 Jun 2004 23:17:19 +0900 From: Jaeho Shin To: "Pierre A. Humblet" Cc: cygwin AT cygwin DOT com Subject: Re: Unable to open files including Korean names Message-ID: <20040615141718.GD5948@sab.mazic.org> References: <20040612183000 DOT GA1628 AT sab DOT mazic DOT org> <20040612183000 DOT GA1628 AT sab DOT mazic DOT org> <3 DOT 0 DOT 5 DOT 32 DOT 20040613145523 DOT 00805ce0 AT incoming DOT verizon DOT net> <20040614111257 DOT GA3736 AT sab DOT mazic DOT org> <40CDAE70 DOT 86F50279 AT ieee DOT org> <40CE0845 DOT F6278F8F AT ieee DOT org> <20040615111128 DOT GA5948 AT sab DOT mazic DOT org> <40CEF62E DOT 1526816A AT ieee DOT org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xaMk4Io5JJdpkLEb" Content-Disposition: inline In-Reply-To: <40CEF62E.1526816A@ieee.org> User-Agent: Mutt/1.4.1i Organization: SPARCS, KAIST X-IsSubscribed: yes Note-from-DJ: This may be spam --xaMk4Io5JJdpkLEb Content-Type: text/plain; charset=euc-kr Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, 2004-06-15 09:14:22 -0400, Pierre A. Humblet wrote: > Thanks. Nothing conclusive. > Could you compile and run the following one line program?=20 >=20 > #include > #include >=20 > main() > { > printf("AreFileApisANSI %d\n", AreFileApisANSI());=20 > } >=20=20 > Compile it with > gcc -mno-cygwin try_ansi.c=20 >=20 > With the -mno-cygwin, the value of CYGWIN=3Dcodepage:oem > shouldn't matter. When compiled without that switch > codepage:oem or codepage:ansi should matter. >=20 > Running on 1.5.9 is OK. Here's the result: $ gcc -mno-cygwin try_ansi.c=20 $ ./a.exe=20 AreFileApisANSI 1 $=20 >=20 > Also, the Korean directory name has numerical value > ~> od -x xx.txt=20 > 0000000 d1c7 dbb1 >=20 > Do you know what encoding that is? Is it Unicode or UTF8? > If it is UTF8, do you know what the Unicode values should be? Well, that's in EUC-KR and CP949. CP949 has some more characters defined in the empty areas of EUC-KR. The directory name I used, ``=C7=D1=B1=DB'', which is pronounced ``hangeul'' and means Korean (written language) in Korean, is consisted of two characters: U+D55C: Hangul syllable Hieuh A Nieun, U+AE00: Hangul syllable Kiyeok Eu Rieul. (Perhaps, you may be able to find it from Windows charmap) Neither character is in CP949's extension, so they have identical values in both EUC-KR and CP949 encoding. Yes, you gave me the identical numerical value I use.=20=20 Running, `echo -n =C7=D1=B1=DB | od -x -` tells me: 0000000 d1c7 dbb1 Now, `echo -n =C7=D1=B1=DB | iconv -f euc-kr -t utf-8 | od -x -` tells me: 0000000 95ed ea9c 80b8 Yes, it's in EUC-KR (or CP949 equivalently in this case). I don't use unicode environment yet. Actually, I don't know how to change encoding from Windows. Korean version of Windows just uses CP949 as default. Looks like od's output is in little-endian. This identifies them as U+D55C and U+AE00, `echo -n =C7=D1=B1=DB | iconv -f euc-kr -t ucs-2 | od -x= -`: 0000000 5cd5 00ae > Thanks for your help My pleasure. :) BTW, is there any reason you not sending your msgs to cygwin ML? If not, I'll just keep Cc'ing to it. --=20 =BD=C5=C0=E7=C8=A3 | Jaeho Shin | http://netj.org/ System Programmers' Association for Researching Computer Systems Division of Computer Science, Department of EECS, KAIST --xaMk4Io5JJdpkLEb Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Cygwin) iD8DBQFAzwTueGASkZ411HcRArzUAKCh4G54EQg3ZWLrqaJTas93RqJMwQCgvPID eIzVYt3T+A2VBxUPhLivNs4= =vHqi -----END PGP SIGNATURE----- --xaMk4Io5JJdpkLEb--