delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2020/10/04/07:20:52

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3F63D393C847
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1601810406;
bh=v6zWs5tu0Uq/z9IEYWw9Wwpic4yp3AcQq+jf7oIGFDY=;
h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=yKOMLwbK8LL5zAEwJxZV9e9tVkhpMlLCijEDpjVz9+x+732kp+ibBYME72L+0bEO1
HJ5Hgf6s4oVKGDk0iYJ8osMCWifJj+k8iofehIDqape0uKeQLEcLHg+8/QR7+f8nC0
3rMXIAaLH8UrXd8e91TAB1M5GvUPlmF9onHpqwtc=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8CC55385043D
Date: Sun, 4 Oct 2020 14:18:09 +0300
X-Mailer: The Bat! (v6.8.8) Home
X-Priority: 3 (Normal)
Message-ID: <634821436.20201004141809@yandex.ru>
To: =?utf-8?Q?J=C3=A9r=C3=B4me_Froissart?= <software AT froissart DOT eu>,
cygwin AT cygwin DOT com
Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted
arguments
In-Reply-To: <CAFC9CLCtfMORMxAK6==jdwY5ZbX6jWwo+JCfDwM3njgvGduf0w@mail.gmail.com>
References: <CAFC9CLCtfMORMxAK6==jdwY5ZbX6jWwo+JCfDwM3njgvGduf0w AT mail DOT gmail DOT com>
MIME-Version: 1.0
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_THEBAT,
NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS,
TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Andrey Repin via Cygwin <cygwin AT cygwin DOT com>
Reply-To: cygwin AT cygwin DOT com
Cc: Andrey Repin <anrdaemon AT yandex DOT ru>
Errors-To: cygwin-bounces AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 094BKUcS002085

Greetings, Jérôme Froissart!

> By discussing a merge request on another project [1], I think
> billziss-gh found a weirdness in the way Cygwin parses the command
> line arguments when non-ASCII characters come into play.

> EXPECTED BEHAVIOUR:
> cygwin should parse the following command line
>     binary.exe --non-ascii "charaçtérs" --ascii "nothing-fancy-here"
> as
>     argv = ["binary.exe",
>             "--non-ascii",
>             "chara\xXX\xXXt\xXX\xXXrs",
>             "--ascii",
>             "nothing-fancy-here"]
>     // \xXX\xXX being the UTF-8 encoding of the special characters,
> but this does not really matter here
> before calling main()

> ACTUAL BEHAVIOUR:
> it parses it as
>     argv = ["binary.exe",
>             "--non-ascii",
>             "\"chara\xXX\xXXt\xXX\xXXrs\"", // mind the unstripped
> quotes here...
>             "--ascii",
>             "nothing-fancy-here" // ...but not here
>     ]

> It looks that words containing UTF-8 characters are not properly
> stripped when they are surrounded by quotes, unlinke ASCII words.

> More examples and a better description is available at [1] (thanks to
> billziss-gh for his analysis, much more thorough than mine)
> For the record, we wrote a work-around in our specific program, but
> handling this issue in Cygwin might be a better way to solve it.

> [1]: https://github.com/billziss-gh/sshfs-win/pull/208 (Checking for
> quotes around non-ascii usernames passed by Windows)

> Thanks for your help! In case you didn't have time, please tell me
> where to look at, and I might try to fix it myself and send a patch
> proposal if that is easy enough (I have never read Cygwin's code yet).

This seems like the Cygwin command was launched from a non-Cygwin terminal or
from a terminal where locale was not set to UNICODE.

Please provide the results of "locale" command right before running your test
binary.


-- 
With best regards,
Andrey Repin
Sunday, October 4, 2020 14:16:17

Sorry for my terrible english...
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019