delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/05/29/17:05:07

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-0.9 required=5.0 tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <4A204DEE.1060004@sidefx.com>
Date: Fri, 29 May 2009 17:04:46 -0400
From: Edward Lam <edward AT sidefx DOT com>
User-Agent: Thunderbird 2.0.0.21 (Windows/20090302)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line
References: <200905281541 DOT 33404 DOT michael DOT renner AT gmx DOT de> <4A1EAD91 DOT 1060701 AT sidefx DOT com> <e2480c70905281131u37651a2eoba946637bd414516 AT mail DOT gmail DOT com> <4A1EF2CE DOT 2060509 AT sidefx DOT com> <3f0ad08d0905290813m39999f81q918e94e3c960eb3f AT mail DOT gmail DOT com> <4A200287 DOT 8030403 AT sidefx DOT com> <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> <4A200BC0 DOT 9010704 AT sidefx DOT com> <e2480c70905291142o2bcc65ccw2287d175dbd09dd5 AT mail DOT gmail DOT com> <4A204149 DOT 2050009 AT sidefx DOT com> <e2480c70905291337g6c8bcca7xd0baba79c84629db AT mail DOT gmail DOT com>
In-Reply-To: <e2480c70905291337g6c8bcca7xd0baba79c84629db@mail.gmail.com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Alexey Borzenkov wrote:
 > It might be safe for you, but not for other people. If you have a
 > Russian default codepage and ever need to work with chineese/japanese
 > filenames and cygwin uses default codepage for filesystem operations
 > (as in 1.5 right now), then you are really screwed. In my opinion
 > utf-8 is a silver bullet here, and I'm very glad it went that way.

I must be missing something here. Suppose you have a default Russian 
code page, with LANG unset (ie. cygwin 1.7 uses UTF-8). Now, if you're 
using any non-Unicode, non-CodePage aware, native application to create 
a Russian filename, isn't Windows going to convert the filename from the 
Russian code page into UTF-16 for storage in NTFS? If that is the case, 
and then you do an ls from cygwin 1.7, aren't you going to get the wrong 
filename displayed? ie. interoperability with non-Unicode, non-CodePage 
aware native applications will be broken for you too with the current 
default cygwin 1.7 behaviour.

Or is this, not a case that you care about and you *only* use cygwin 
applications?

Regards,
-Edward

Alexey Borzenkov wrote:
> On Sat, May 30, 2009 at 12:10 AM, Edward Lam <edward AT sidefx DOT com> wrote:
>> Thanks for explaining the UTF8 changes in cygwin 1.7. However, the decision
>> to use UTF-8 for the C locale is questionable.
> 
> Not at all, because utf-8, as far as I understand, is used for
> communication with the system in this context, and does not force
> anything to the application. Most modern unixes use utf-8 nowadays, it
> means that even if you have a C locale your terminal outputs text in
> utf-8, your input is utf-8, your filenames are utf-8 (well, not
> really, but the rest of the system sees them that way). Same stuff
> here, except that launching non-cygwin processes is communication with
> the system as well, and it needs conversion. And where is conversion
> there is always possible loss of data. One way or the other.
> 
>> It seems to me that it would be much safer to use the SYSTEM DEFAULT code
>> page (ie. the return value of the system GetACP() function) for CYGWIN
>> instead, ensuring compatibility for the large class native Windows
>> applications that are non-Unicode, non-CodePage aware.
> 
> It might be safe for you, but not for other people. If you have a
> Russian default codepage and ever need to work with chineese/japanese
> filenames and cygwin uses default codepage for filesystem operations
> (as in 1.5 right now), then you are really screwed. In my opinion
> utf-8 is a silver bullet here, and I'm very glad it went that way.
> 
>> I think it's very bad that changing LANG can result in a truncated *command
>> line*, that has nothing to do with printf. The printf in the code was just
>> for testing. The HUGE bug is that the application gets the  WRONG NUMBER OF
>> ARGUMENTS.
> 
> No, the bug is not that it gets wrong number of arguments. In fact,
> Windows has no concept of arguments, only C runtime does, which parses
> the command line. If command line is truncated, then C runtime will
> have missing arguments when it tries to parse it.
> 
> I mentioned wprintf because recently I was wondering why
> mkpasswd/mkgroup had a strange truncating behavior with russian
> usernames and it turned out that wprintf, when it can't encode some
> characters, stops right there and returns an error code. But, honesly,
> who ever checks return codes from printf?
> 
> Here might be something similar. When constructing command line some
> function is called and can't encode some character, returns error
> status, but it's never checked, and you get truncated command line.
> 
> And btw, I'm not cygwin developer here, I'm just a speculating user
> right now, because I haven't been searching this problem in the code.
> 
> --
> Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
> Problem reports:       http://cygwin.com/problems.html
> Documentation:         http://cygwin.com/docs.html
> FAQ:                   http://cygwin.com/faq/
> 


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019