X-Spam-Check-By: sourceware.org To: cygwin AT cygwin DOT com From: Ross Smith Subject: Re: fopen with UTF-8 chars in filenames Date: Mon, 20 Mar 2006 14:12:18 +1200 Lines: 41 Message-ID: References: <20060315010359 DOT GD15036 AT trixie DOT casa DOT cgf DOT cx> <20060316205210 DOT GD14672 AT trixie DOT casa DOT cgf DOT cx> <20060317212225 DOT GA31105 AT trixie DOT casa DOT cgf DOT cx> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit User-Agent: Thunderbird 1.5 (Windows/20051201) In-Reply-To: X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Igor Peshansky wrote: > On Fri, 17 Mar 2006, Paul J. Lucas wrote: > >> On Fri, 17 Mar 2006, Christopher Faylor wrote: >> >>> Cygwin doesn't provide _wfopen. >> 1. I install Cygwin. >> 2. It's in stdio.h that gets installed as part of the Cygwin install. > > No, actually it's in stdio.h that's part of MinGW (and is installed as > part of Cygwin only because Cygwin's gcc has a built-in MinGW > cross-compiler). Just in case anyone out there is still interested in the actual question... :) The Right Way to do this (open a file whose name is given in UTF-8), or as close to a Right Way as you can get under the circumstances, is to use iconv() to convert the file name to the local multibyte character set, and then use plain fopen() on that. In iconv_open(), use "utf-8" for the source character set, and an empty string for the target character set. (This isn't 100% portable; the Posix standard doesn't specify the character set names that can be used in iconv_open(). Every implementation understands "utf-8", but some use "char" instead of "" to mean "the local multibyte character set, whatever that happens to be".) If your file name contains characters that can't be represented in the local MBCS, you're out of luck. Cygwin only supports multibyte file names (not Cygwin's fault, it's a Posix limitation); to use the full Unicode character set in a file name you have no choice but to fall back on the Windows API. (You can use both the Windows and Cygwin APIs in the same program, that doesn't cause any problems, other than to portability.) Technically the above procedure also contains a race condition, since it's theoretically possible for the native multibyte character set to change (via the system's locale settings) between the calls to iconv() and fopen(). Again, this is a problem in the Posix API and can't be portably worked around. If you don't care about portability and just want something that works with Cygwin, you might as well just use the Windows API. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/