X-Recipient: archive-cygwin AT delorie DOT com X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BB94538708B9 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=SystematicSw.ab.ca Authentication-Results: sourceware.org; spf=none smtp.mailfrom=brian DOT inglis AT systematicsw DOT ab DOT ca X-Authority-Analysis: v=2.4 cv=EcV2/NqC c=1 sm=1 tr=0 ts=5f85e528 a=kiZT5GMN3KAWqtYcXc+/4Q==:117 a=kiZT5GMN3KAWqtYcXc+/4Q==:17 a=IkcTkHD0fZMA:10 a=UqQ4zHKdYkjxotR0mFcA:9 a=QEXdDO2ut3YA:10 Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments To: cygwin AT cygwin DOT com References: <634821436 DOT 20201004141809 AT yandex DOT ru> From: Brian Inglis Autocrypt: addr=Brian DOT Inglis AT SystematicSw DOT ab DOT ca; prefer-encrypt=mutual; keydata= mDMEXopx8xYJKwYBBAHaRw8BAQdAnCK0qv/xwUCCZQoA9BHRYpstERrspfT0NkUWQVuoePa0 LkJyaWFuIEluZ2xpcyA8QnJpYW4uSW5nbGlzQFN5c3RlbWF0aWNTdy5hYi5jYT6IlgQTFggA PhYhBMM5/lbU970GBS2bZB62lxu92I8YBQJeinHzAhsDBQkJZgGABQsJCAcCBhUKCQgLAgQW AgMBAh4BAheAAAoJEB62lxu92I8Y0ioBAI8xrggNxziAVmr+Xm6nnyjoujMqWcq3oEhlYGAO WacZAQDFtdDx2koSVSoOmfaOyRTbIWSf9/Cjai29060fsmdsDLg4BF6KcfMSCisGAQQBl1UB BQEBB0Awv8kHI2PaEgViDqzbnoe8B9KMHoBZLS92HdC7ZPh8HQMBCAeIfgQYFggAJhYhBMM5 /lbU970GBS2bZB62lxu92I8YBQJeinHzAhsMBQkJZgGAAAoJEB62lxu92I8YZwUBAJw/74rF IyaSsGI7ewCdCy88Lce/kdwX7zGwid+f8NZ3AQC/ezTFFi5obXnyMxZJN464nPXiggtT9gN5 RSyTY8X+AQ== Organization: Systematic Software Message-ID: <94661da7-6f0b-8c7a-0887-5ecffb66b83d@SystematicSw.ab.ca> Date: Tue, 13 Oct 2020 11:34:30 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: Content-Language: en-CA X-CMAE-Envelope: MS4xfMmSAgadk9LYB6A+XSfXVHBcSKDEXsU5w9PAWl3/Kw5ILUuvONopUi6MH627ynhq6dLj3nflbt0oEGpaYr9ezxMr2Wq9X7z/Wsfqxac/whGG+O06cfzA tQBYp2naXNga1r+1iquuV4+3eqHImIHfR3H3AyRzXsoq0q81L8EW8efBkscdp16vKPO2FbvQ3Kqg0riN2Oh0y5VFyFwRmhi6j54= X-Spam-Status: No, score=-6.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: cygwin AT cygwin DOT com Content-Type: text/plain; charset="utf-8" Errors-To: cygwin-bounces AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 09DHZ2Un022529 On 2020-10-06 15:36, Jérôme Froissart wrote: > Here are the more detailed steps to reproduce the issue (along with > answers to your requests about `uname`, `locale`, etc.). > (I mostly reproduced what billziss-gh had done before, I do not take > all the credits :D) > > Here is an example C file > $ cat example.c > #include > > const char *GetCommandLineA(void); > > int main(int argc, char *argv[]) > { > const char *s = GetCommandLineA(); > printf("C=%s\n", s); > > for (int i = 0; argc > i; i++) > printf("%d=%s\n", i, argv[i]); > > return 0; > } > > I have built it with gcc from Cygwin > $ gcc -o binary example.c > > Running it from the same Cygwin bash prompt works as expected > $ uname -a > CYGWIN_NT-10.0 XPS 3.1.5(0.340/5/3) 2020-06-01 08:59 x86_64 Cygwin > # (XPS is my Windows machine name) > > $ locale > LANG=fr_FR.UTF-8 > LC_CTYPE="fr_FR.UTF-8" > LC_NUMERIC="fr_FR.UTF-8" > LC_TIME="fr_FR.UTF-8" > LC_COLLATE="fr_FR.UTF-8" > LC_MONETARY="fr_FR.UTF-8" > LC_MESSAGES="fr_FR.UTF-8" > LC_ALL= > > $ which gcc > /usr/bin/gcc > > # The following runs as expected > $ ./binary.exe "foo bar" "Jérôme" > C="C:\Users\Public\binary.exe" > 0=./binary > 1=foo bar > 2=Jérôme > > Now, let's start a Windows shell (cmd.exe) > Note that I had to copy cygwin1.dll from my Cygwin installation > directory, otherwise binary.exe would not start. > I do not know whether there is a `locale` equivalent in Windows > command prompt, so I merely ran my program. > C:\Users\Public>binary.exe "foo bar" "Jérôme" > C=binary.exe "foo bar" "J□r□me" > 0=binary > 1=foo bar > 2="Jérôme" > > This behaviour is not expected and is quite inconsistent with what > happened through Bash. > Besides the "strange squares" that appear on the first line, and the > extra space after binary.exe, I especially did not expect "Jérôme" to > remain quoted as a second argument. Don't call inappropriate Windows functions without understanding the limitations of Windows and its APIs. Cygwin args are consistent with what you ran and what we would all expect. I don't see any Cygwin problems here. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.] -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple