X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3F63D393C847 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1601810406; bh=v6zWs5tu0Uq/z9IEYWw9Wwpic4yp3AcQq+jf7oIGFDY=; h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=yKOMLwbK8LL5zAEwJxZV9e9tVkhpMlLCijEDpjVz9+x+732kp+ibBYME72L+0bEO1 HJ5Hgf6s4oVKGDk0iYJ8osMCWifJj+k8iofehIDqape0uKeQLEcLHg+8/QR7+f8nC0 3rMXIAaLH8UrXd8e91TAB1M5GvUPlmF9onHpqwtc= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8CC55385043D Date: Sun, 4 Oct 2020 14:18:09 +0300 X-Mailer: The Bat! (v6.8.8) Home X-Priority: 3 (Normal) Message-ID: <634821436.20201004141809@yandex.ru> To: =?utf-8?Q?J=C3=A9r=C3=B4me_Froissart?= , cygwin AT cygwin DOT com Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments In-Reply-To: References: MIME-Version: 1.0 X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, KAM_THEBAT, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Andrey Repin via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Andrey Repin Content-Type: text/plain; charset="utf-8" Errors-To: cygwin-bounces AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 094BKUcS002085 Greetings, Jérôme Froissart! > By discussing a merge request on another project [1], I think > billziss-gh found a weirdness in the way Cygwin parses the command > line arguments when non-ASCII characters come into play. > EXPECTED BEHAVIOUR: > cygwin should parse the following command line > binary.exe --non-ascii "charaçtérs" --ascii "nothing-fancy-here" > as > argv = ["binary.exe", > "--non-ascii", > "chara\xXX\xXXt\xXX\xXXrs", > "--ascii", > "nothing-fancy-here"] > // \xXX\xXX being the UTF-8 encoding of the special characters, > but this does not really matter here > before calling main() > ACTUAL BEHAVIOUR: > it parses it as > argv = ["binary.exe", > "--non-ascii", > "\"chara\xXX\xXXt\xXX\xXXrs\"", // mind the unstripped > quotes here... > "--ascii", > "nothing-fancy-here" // ...but not here > ] > It looks that words containing UTF-8 characters are not properly > stripped when they are surrounded by quotes, unlinke ASCII words. > More examples and a better description is available at [1] (thanks to > billziss-gh for his analysis, much more thorough than mine) > For the record, we wrote a work-around in our specific program, but > handling this issue in Cygwin might be a better way to solve it. > [1]: https://github.com/billziss-gh/sshfs-win/pull/208 (Checking for > quotes around non-ascii usernames passed by Windows) > Thanks for your help! In case you didn't have time, please tell me > where to look at, and I might try to fix it myself and send a patch > proposal if that is easy enough (I have never read Cygwin's code yet). This seems like the Cygwin command was launched from a non-Cygwin terminal or from a terminal where locale was not set to UNICODE. Please provide the results of "locale" command right before running your test binary. -- With best regards, Andrey Repin Sunday, October 4, 2020 14:16:17 Sorry for my terrible english... -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple