X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Thu, 19 Mar 2009 21:30:46 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Q: Is anybody here using the CYGWIN=codepage:oem setting? Message-ID: <20090319203046.GF9322@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20090319130909 DOT GZ9322 AT calimero DOT vinschen DOT de> <49C281F7 DOT 6080602 AT acm DOT org> <20090319181323 DOT GB1868 AT calimero DOT vinschen DOT de> <49C29366 DOT 8080708 AT acm DOT org> <20090319192031 DOT GB9322 AT calimero DOT vinschen DOT de> <20090319192229 DOT GC9322 AT calimero DOT vinschen DOT de> <20090319201144 DOT GE9322 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090319201144.GE9322@calimero.vinschen.de> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Mar 19 21:11, Corinna Vinschen wrote: > On Mar 19 19:41, Eric Blake wrote: > > Corinna Vinschen cygwin.com> writes: > > > ...unless Cygwin itself would call setlocale(). > > > > I'm not a fan of that. POSIX is explicit that an application that > > intentionally avoids calling setlocale() shall behave as though it had called > > setlocale(LC_ALL,"C"). > > [...] > But I admit that I'm not very happy with this idea either. Still, we > have to convert from MB to WC and vice-versa independently of the > application, while other systems based on byte charsets simply don't > have this problem. Here's another idea: If the codeset is not UTF-8, and if a filename contains wide chars not representable in the current ANSI codeset, use the good old ASCII "SO/SI" method. Example: Assuming the ANSI codepage is CP1252. Assuming the filename is in UTF-16 /dir/to/foo\x1234bar All chars except for \x1234 are convertible to the current ANSI code page. The convertible chars are converted as usual. The non-convertible characters are converted to an ASCII SO/SI sequence: /dir/to/foo\x0e\x12\x34\x0fbar On the way back, Cygwin converts SO/SI sequences back to their UTF-16 counterpart and converts everything else using the current\ codepage to UTF-16 conversion. This would allow to manipulate all files on the disk regardless of using characters invalid in the current CP. Does that solution make sense? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/