X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <4A204DEE.1060004@sidefx.com> References: <200905281541 DOT 33404 DOT michael DOT renner AT gmx DOT de> <4A1EF2CE DOT 2060509 AT sidefx DOT com> <3f0ad08d0905290813m39999f81q918e94e3c960eb3f AT mail DOT gmail DOT com> <4A200287 DOT 8030403 AT sidefx DOT com> <3f0ad08d0905290852xe41338alfda89c622f92f677 AT mail DOT gmail DOT com> <4A200BC0 DOT 9010704 AT sidefx DOT com> <4A204149 DOT 2050009 AT sidefx DOT com> <4A204DEE DOT 1060004 AT sidefx DOT com> Date: Sat, 30 May 2009 01:22:07 +0400 Message-ID: Subject: Re: 1.7.0-48: [BUG] Passing characters above 128 from bash command line From: Alexey Borzenkov To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Sat, May 30, 2009 at 1:04 AM, Edward Lam wrote: > Alexey Borzenkov wrote: >> It might be safe for you, but not for other people. If you have a >> Russian default codepage and ever need to work with chineese/japanese >> filenames and cygwin uses default codepage for filesystem operations >> (as in 1.5 right now), then you are really screwed. In my opinion >> utf-8 is a silver bullet here, and I'm very glad it went that way. > I must be missing something here. Suppose you have a default Russian code > page, with LANG unset (ie. cygwin 1.7 uses UTF-8). Now, if you're using any > non-Unicode, non-CodePage aware, native application to create a Russian > filename, isn't Windows going to convert the filename from the Russian code > page into UTF-16 for storage in NTFS? If that is the case, and then you do > an ls from cygwin 1.7, aren't you going to get the wrong filename displayed? > ie. interoperability with non-Unicode, non-CodePage aware native > applications will be broken for you too with the current default cygwin 1.7 > behaviour. > > Or is this, not a case that you care about and you *only* use cygwin > applications? No, it is precisely that I care about both ends of interoperability. Here is a hypotetical situation: for filename in `ls`; do someprogram $filename done Here, when I use russian Windows and I don't have LANG set (or when I have LANG=en_US.UTF-8), filename will be utf-8 multibyte string. So both, russian and european/chinese/japanese filenames will be valid. Now there are three possibilities: 1) someprogram is a cygwin application, then it must be that $filename will be passed as is, without any conversions 2) someprogram is a unicode application, then it will have a correct unicode argument 3) someprogram is an ansi application, then Windows (cygwin has nothing to do with it) will convert its unicode arguments to system's codepage (cp1251 for Russian) and any character that can't be encoded will be replaced with question marks. This is solely someprogram's fault and cygwin has nothing to do with it. All I'm trying to say is that on Windows (since WinNT) arguments are always in unicode. It just so happens that when ansi applications call other ansi applications with a sequence of bytes, it first gets converted to unicode, then back to ansi, and you get the same sequence of bytes. But the arguments are always characters, not bytes. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/