X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Message-ID: <4D8761DE.1070300@cwilson.fastmail.fm> Date: Mon, 21 Mar 2011 10:34:06 -0400 From: Charles Wilson Reply-To: Charles Wilson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: cygwin + GetConsoleOutputCP References: <4D8651F2 DOT 3000200 AT cwilson DOT fastmail DOT fm> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 3/21/2011 3:53 AM, Andy Koppe wrote: > I think defaulting to the console codepage makes sense for the DOS > side of the conversion. Having said that, Windows files that aren't > "Unicode", i.e. UTF-16, are usually encoded in the so-called ANSI > codepage, e.g. CP1252, so it would make more sense to default to that. > > However, the real problem with this feature is that the Unix side of > the conversion is fixed to ISO-8859-1, which makes it near-useless > when Cygwin defaults to UTF-8. And it's no use for non-Western > European languages in any case. Meh...the same basic set of options/conversions is provided if unix2dos is compiled on linux. Only there, the "offending" function is implemented as: unsigned short query_con_codepage(void) { return(0); } However, each time query_con_codepage is called, it is followed by: if ([return value of query_con_codepage] < 2) pFlag->ConvMode = CONVMODE_437; IOW, on linux, when using -iso with no specific code page, it acts just as if you had simply specified -437 for the "dos" side; the "unix" side is still, as always, iso-8859-1. > A worthwhile conversion feature would use > MultiByteToWideChar()/WideCharToMultiByte() defaulting to the system's > ANSI codepage on the DOS side, and mbstowcs()/wcstombs() defaulting to > the charset specified by the LC_CTYPE locale category on the Unix > side. Well, if you want full-featured charset conversion, then that's what iconv(1) is for. These added features of dos2unix/unix2dos are, in reality, quick and dirty approaches to *single byte* charset conversion for a *limit set* of charsets. I'm not looking to re-implement the whole thing or modify the semantics of the options. (Or even add a new set of options.) I'm just trying to make sure that, given the existing semantics of the options, that dos2unix selects the proper default CP for the "dos" side -- using whatever is considered the definitive source for the current "dosish" active codepage on the cygwin platform -- when the existing options are used.) -- Chuck -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple