delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2011/11/10/00:19:24

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-0.3 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
X-EN-OrigOutIP: 10.20.18.13
X-EN-IMPSID: vHJz1h0050GvDVm01HJzeH
From: "Leon Vanderploeg" <leonv AT vaultnow DOT com>
To: <cygwin AT cygwin DOT com>
References: <135801cc9a69$f73ceaf0$e5b6c0d0$@vaultnow.com> <4EB30DF9 DOT 2080006 AT cwilson DOT fastmail DOT fm> <20111104084619 DOT GM9159 AT calimero DOT vinschen DOT de>
In-Reply-To: <20111104084619.GM9159@calimero.vinschen.de>
Subject: RE: Possible Bug (clarification) in Cygwin 1.7.5 -- findfirstfile (and findnextfile) yeild bad cfilename when file names have special characters. Works in cygwin 1.5, fails in 1.7
Date: Wed, 9 Nov 2011 22:18:58 -0700
Message-ID: <029901cc9f68$41108a80$c3319f80$@vaultnow.com>
MIME-Version: 1.0
X-EN-UserInfo: ca4eb83b29c199dc675fd93de881b90e:2283ef65109048eed6984feda31515c6
X-EN-AuthUser: leonv AT vaultnow DOT com
X-EN-OrigIP: 24.8.203.182
X-EN-OrigHost: c-24-8-203-182.hsd1.co.comcast.net
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id pAA5JJWU005061

Many thanks to Charles and Corinna for the help.  I have modified the code to use the POSIX functions.  I still have one problem I cannot seem to conquer.  

I need to be able to read and write the (yes, I know it's evil) archive bit.  Unless there is a POSIX function (which I seriously doubt) for these items, I am locked into the windows APIs.

I have read and re-read the Cygwin documentation on internationalization at least 6 times and I cannot figure out what I need to do to get this to work.  I have tried numerous combinations of environment variables and locale settings in the code, but none of them work.  The windows API fails to find the file specified.  I just want US English that can handle the extended character set to the windows APIs.  In this case, let's use the example of the copyright symbol (the small c with a circle around it).  What needs to be set in the environment, and what needs to be set in the C code to handle these characters correctly?

Your help and assistance is GREATLY appreciated.

Leon

Leon Vanderploeg
Cell   303-877-9654


On Nov  3 17:56, Charles Wilson wrote:
> On 11/3/2011 4:48 PM, Leon Vanderploeg wrote:
> > With cygwin 1.7.5, cFileName with a special characters such as ñ (n 
> > with tidle above it) fail be properly extracted from a 
> > WIN32_FIND_DATA structure with findFirstFile (or findNextFile).
> > 
> > To set up a simple test scenario, I created a file in C:\Testing 
> > named  Mañana.docx.  I compiled the code at the end of this message 
> > on Cygwin 1.7.9 with GCC version 3.4.4 on Server 2008 32 bit system.
> > On this system (and on a Windows 7 32 bit machine), it returns:
> 
> a) Why are you using native Win32 APIs in a cygwin program? You should 
> be using the POSIX interfaces instead -- see /usr/include/dirent.h.
> 
> DIR *opendir (const char *);
> DIR *fdopendir (int);
> struct dirent *readdir (DIR *);
> int readdir_r (DIR *, struct dirent *, struct dirent **); void 
> rewinddir (DIR *); int closedir (DIR *);

ACK++

> b) What you observe is an artifact of cygwin-1.7's new *support* for 
> i18n.  In cygwin-1.5, it just didn't care and passed all the bytes 
> back exactly as found without transliteration.  In 1.7, it (correctly) 
> transcodes strings into the current locale -- and your current locale 
> does not appear to support ñ -- or, at least, you haven't told cygwin 
> to use the correct one.
> 
> (I'm probably thoroughly botching this explanation, but the point is,

Just a bit.  What you have to keep in mind is that Windows stores all object names, including filenames, as UTF-16 strings, UNICODE in Windows terminology.  When you use the ANSI Win32 API as in this example, then the UTF-16 names are converted to the currently defined ANSI charset on output, for instance codepage 1252 for Western Europe languages.

Cygwin 1.5 either used the ANSI API, or it converted strings from UTF-16 to the current Windows ANSI charset or vice versa.

Cygwin 1.7 doesn't use the ANSI API anymore, rather it uses UNICODE to talk to Windows only, and the multibyte charset is defined through the
environment(*) as defined in POSIX.  UTF-8 is the default now.

> you need to check your LC_* and LANG env vars, and maybe call 
> setlocale(LC_ALL, "") in your application.)

And even than the code won't work.  If you don't define UNICODE, FindFirstFile/FindNextFile will use the ANSI versions of this API, FindFirstFileA/FindNextFileA.  If you didn't set your LANG/LC_CTYPE/LC_ALL variables to use your current Windows ANSI charset *and* called setlocale, Cygwin will use UTF-8 by default.  Therefore, the character ñ will have another multibyte encoding, 0xc3 0xb1, rather than, say, 0xf1 in Windows codepage 1252.  To avoid this problem, you can use the UNICODE API FindFirstFileW/ FindNextFileW and convert the filename the current multibyte charset via wcstombs and friends.

However, as Chuck has pointed out, the obviously right thing to do is to use the POSIX API opendir/readdir/closedir instead.


Corinna



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019