X-Recipient: archive-cygwin AT delorie DOT com X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 38A5C3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=froissart.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=jerome DOT froissart AT gmail DOT com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=W9Bos3VsN2yrCPSwI47XvnUMkdXPEgGSCIFb9oacb3Q=; b=JvofSgtlrroxp1Y7AjkmDw36agMg5+arAH07oJtmFpeXZP6tR7cjuRD5f53ZfoyliL 3PVCbZ9P5oSAdZFLAO3XN83qSnHla/fXAu1S0yC0JIc5YvDfx2RstyWpjkITmXXG4iU4 lhard5Bu1UuGKqeMvRHYCbgW2NGSt0ow97mXwtGvc8p0G1FVlpLMJJtavFKU5ULUkfXU OMMXjtuZ9rNBwKS/gnoZtbGf0OSxXr38bHX89kAHHSRTBYJdGdfEw7I9jsoUMN+y+e9e Pria/93yyPz945SMOpc4QUQbDYAE22TOZUduXj4yUZPUG5oBLKYhxU/VKWJ3A2H9vn1n wQNg== X-Gm-Message-State: AOAM53113A0Ix/DOgv621572sSuJRzlmHuLD/duEGziV4jz2A/aXV0BO pnzxzmfg0/I2p8dcZ32Ur0kY8Yogq2B+DfeK7ES4J5gCRWI= X-Google-Smtp-Source: ABdhPJw7/QiR2eUsBXynUERmL6MMGPGzha5LoST0HWcDshbq8KE/juvns7QTCE/1eR+8JwKwq727GqZMxKPMyQlQ5W8= X-Received: by 2002:a05:6402:699:: with SMTP id f25mr129312edy.372.1602020169846; Tue, 06 Oct 2020 14:36:09 -0700 (PDT) MIME-Version: 1.0 References: <634821436 DOT 20201004141809 AT yandex DOT ru> In-Reply-To: <634821436.20201004141809@yandex.ru> From: =?UTF-8?B?SsOpcsO0bWUgRnJvaXNzYXJ0?= Date: Tue, 6 Oct 2020 23:36:07 +0200 Message-ID: Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted arguments To: cygwin AT cygwin DOT com X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 List-Id: General Cygwin discussions and problem reports List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?B?SsOpcsO0bWUgRnJvaXNzYXJ0?= Content-Type: text/plain; charset="utf-8" Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 096Lageq009208 Thanks for your replies. This issue only happens when a program is run from cmd.exe, not from a Cygwin bash shell. This is important for me, since I discovered this bug in a project that must be run from Windows graphical shell (i.e. there is no sensible way to run it through Cygwin and Bash). > Please show us the output from "uname -a" and "locale" run from the bash prompt. > Please provide the results of "locale" command right before running your test > binary. Here are the more detailed steps to reproduce the issue (along with answers to your requests about `uname`, `locale`, etc.). (I mostly reproduced what billziss-gh had done before, I do not take all the credits :D) Here is an example C file $ cat example.c #include const char *GetCommandLineA(void); int main(int argc, char *argv[]) { const char *s = GetCommandLineA(); printf("C=%s\n", s); for (int i = 0; argc > i; i++) printf("%d=%s\n", i, argv[i]); return 0; } I have built it with gcc from Cygwin $ gcc -o binary example.c Running it from the same Cygwin bash prompt works as expected $ uname -a CYGWIN_NT-10.0 XPS 3.1.5(0.340/5/3) 2020-06-01 08:59 x86_64 Cygwin # (XPS is my Windows machine name) $ locale LANG=fr_FR.UTF-8 LC_CTYPE="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" LC_TIME="fr_FR.UTF-8" LC_COLLATE="fr_FR.UTF-8" LC_MONETARY="fr_FR.UTF-8" LC_MESSAGES="fr_FR.UTF-8" LC_ALL= $ which gcc /usr/bin/gcc # The following runs as expected $ ./binary.exe "foo bar" "Jérôme" C="C:\Users\Public\binary.exe" 0=./binary 1=foo bar 2=Jérôme Now, let's start a Windows shell (cmd.exe) Note that I had to copy cygwin1.dll from my Cygwin installation directory, otherwise binary.exe would not start. I do not know whether there is a `locale` equivalent in Windows command prompt, so I merely ran my program. C:\Users\Public>binary.exe "foo bar" "Jérôme" C=binary.exe "foo bar" "J□r□me" 0=binary 1=foo bar 2="Jérôme" This behaviour is not expected and is quite inconsistent with what happened through Bash. Besides the "strange squares" that appear on the first line, and the extra space after binary.exe, I especially did not expect "Jérôme" to remain quoted as a second argument. Sorry for the delay in my answer. I hope this is now clear, please ask me for more examples or investigation if you need. Thanks for your help. Jérôme -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple