delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2006/03/19/21:12:46

X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: Ross Smith <rosss AT pharos DOT co DOT nz>
Subject: Re: fopen with UTF-8 chars in filenames
Date: Mon, 20 Mar 2006 14:12:18 +1200
Lines: 41
Message-ID: <dvl324$rlq$1@sea.gmane.org>
References: <Pine DOT LNX DOT 4 DOT 63 DOT 0603141507100 DOT 23743 AT vgfl DOT cnhywyhpnf DOT bet> <dv7oht$t87$1 AT sea DOT gmane DOT org> <20060315010359 DOT GD15036 AT trixie DOT casa DOT cgf DOT cx> <dvchcb$p9k$1 AT sea DOT gmane DOT org> <20060316205210 DOT GD14672 AT trixie DOT casa DOT cgf DOT cx> <Pine DOT LNX DOT 4 DOT 63 DOT 0603171314380 DOT 23743 AT vgfl DOT cnhywyhpnf DOT bet> <20060317212225 DOT GA31105 AT trixie DOT casa DOT cgf DOT cx> <Pine DOT LNX DOT 4 DOT 63 DOT 0603171642220 DOT 23743 AT vgfl DOT cnhywyhpnf DOT bet> <Pine DOT GSO DOT 4 DOT 63 DOT 0603172002560 DOT 10114 AT access1 DOT cims DOT nyu DOT edu>
Mime-Version: 1.0
User-Agent: Thunderbird 1.5 (Windows/20051201)
In-Reply-To: <Pine.GSO.4.63.0603172002560.10114@access1.cims.nyu.edu>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Igor Peshansky wrote:
> On Fri, 17 Mar 2006, Paul J. Lucas wrote:
> 
>> On Fri, 17 Mar 2006, Christopher Faylor wrote:
>>
>>> Cygwin doesn't provide _wfopen.
>> 1. I install Cygwin.
>> 2. It's in stdio.h that gets installed as part of the Cygwin install.
> 
> No, actually it's in stdio.h that's part of MinGW (and is installed as
> part of Cygwin only because Cygwin's gcc has a built-in MinGW
> cross-compiler).

Just in case anyone out there is still interested in the actual 
question... :)

The Right Way to do this (open a file whose name is given in UTF-8), or 
as close to a Right Way as you can get under the circumstances, is to 
use iconv() to convert the file name to the local multibyte character 
set, and then use plain fopen() on that. In iconv_open(), use "utf-8" 
for the source character set, and an empty string for the target 
character set. (This isn't 100% portable; the Posix standard doesn't 
specify the character set names that can be used in iconv_open(). Every 
implementation understands "utf-8", but some use "char" instead of "" to 
mean "the local multibyte character set, whatever that happens to be".)

If your file name contains characters that can't be represented in the 
local MBCS, you're out of luck. Cygwin only supports multibyte file 
names (not Cygwin's fault, it's a Posix limitation); to use the full 
Unicode character set in a file name you have no choice but to fall back 
on the Windows API. (You can use both the Windows and Cygwin APIs in the 
same program, that doesn't cause any problems, other than to portability.)

Technically the above procedure also contains a race condition, since 
it's theoretically possible for the native multibyte character set to 
change (via the system's locale settings) between the calls to iconv() 
and fopen(). Again, this is a problem in the Posix API and can't be 
portably worked around. If you don't care about portability and just 
want something that works with Cygwin, you might as well just use the 
Windows API.



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019