X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Wed, 23 Sep 2009 15:39:39 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: [1.7] Invalid UTF8 while creating a file -> cannot delete? Message-ID: <20090923133939.GE20981@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <416096c60909101512l6e42ab72l4ba5fd792363eefd AT mail DOT gmail DOT com> <20090921161014 DOT GI20981 AT calimero DOT vinschen DOT de> <416096c60909211154u5ddd5869v986011aa4ee13d57 AT mail DOT gmail DOT com> <20090922094523 DOT GR20981 AT calimero DOT vinschen DOT de> <416096c60909220912s5dd749bh5cfeb670b0e78c7a AT mail DOT gmail DOT com> <20090922170709 DOT GV20981 AT calimero DOT vinschen DOT de> <20090923120154 DOT GY20981 AT calimero DOT vinschen DOT de> <416096c60909230534g44e80d44t66b18d981b4e3a40 AT mail DOT gmail DOT com> <20090923124307 DOT GD20981 AT calimero DOT vinschen DOT de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20090923124307.GD20981@calimero.vinschen.de> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Sep 23 14:43, Corinna Vinschen wrote: > On Sep 23 13:34, Andy Koppe wrote: > > 2009/9/23 Corinna Vinschen: > > > I have a local patch ready to use the ANSI codepage by default in the > > > "C" locale.  It appears to work nicely and has the additional positive > > > side effect to simplify the code in a few places. > > > > > > If I only new that eastern language users could happily live with > > > this change as well! > > > > Here's an idea to circumvent the DBCS troubles: default to UTF-8 when > > no charset is specified in the locale and the ANSI charset isn't > > singlebyte. > > > > Based on the following grounds: > > - Full CJK support (and more) out of the box. > > - DBCSs can't have worked very well in 1.5 in the first place, because > > the shell and most applications weren't aware of double-byte > > characters. Hence backward compatibility is less of an issue here. > > - Applications that don't (yet) work with UTF-8 are also unlikely to > > work correctly with DBCSs. > > - Iwamuro Motonori asked for it. > > Yeah, I was tinkering with this idea, too, but it's much more tricky to > implement. > > I'll think about it. Turns out, it's not complicated at all. However, if we default to UTF-8 for a subset of languages anyway, it gets even more interesting to ask, why not for all languages? Isn't it better in the long run to have the same default for all Cygwin installations? I'm really wondering if we shouldn't simply default to UTF-8 as charset throughout, in the application, the console, and for the filename conversion. Yes, not all applications will work OOTB with chars > 0x7f, but it was always a bug to make any assumptions for non-ASCII chars in the C locale. Applications can be fixed, right? Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple