| delorie.com/archives/browse.cgi | search |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| X-Original-To: | cygwin AT cygwin DOT com |
| Delivered-To: | cygwin AT cygwin DOT com |
| DMARC-Filter: | OpenDMARC Filter v1.3.2 sourceware.org 3DE02385DC17 |
| Authentication-Results: | sourceware.org; |
| dmarc=none (p=none dis=none) header.from=froissart.eu | |
| Authentication-Results: | sourceware.org; |
| spf=pass smtp.mailfrom=jerome DOT froissart AT gmail DOT com | |
| X-Google-DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; |
| d=1e100.net; s=20161025; | |
| h=x-gm-message-state:mime-version:references:in-reply-to:from:date | |
| :message-id:subject:to:cc:content-transfer-encoding; | |
| bh=GAiEXA/5vhQVNWtEkY8PRBwveLixkZAEdNmNoFW2g+0=; | |
| b=C6FKhIyp0VhYOdilGe4K6HX2jJh1qT6kKTu0zMuaHOmnrg6x7U1GUny9HFD4nz641u | |
| gfLXJINFgvKuvxa/v6F/rmmYu46GgFYHiCyfc6dIb9xZWjL8G8Rlc74gZyU/atceqjBb | |
| ASeUVwcRFptanGDMBGFJ+VQto4AYOor0aIdItWGTlffOuKta+0lw9dk7gyn6gUnOXFsu | |
| Yaz4vPWzcyITc+267eZT1Y4JMXRrkDIGp+29eTZnEfPf+z5NVs0L/dYA6q92SMjC63/z | |
| S3GYZvtoGhD9wQnW6vN3XDjb3MLC36Q/RAz2I1WXwshwIPUdxQw9xcOkWdY/qIwnzil0 | |
| OizA== | |
| X-Gm-Message-State: | AOAM530lqKGdVRZT/7JXdiuCcjXnpmoBMm8bAn+hU6j0g0WnFypOhW/n |
| 6lfJY5VX2N700X+JH5ZvFs9Kt1Laj4cJrwSUMIE= | |
| X-Google-Smtp-Source: | ABdhPJwnfaurzE5vUnsuuLpCbEs7IAAPU/7aL7JIXwn8mGbiXw32+bhHAvAkimcf5IsELID7DetjlSebyVUIgZyatCQ= |
| X-Received: | by 2002:a17:906:4e16:: with SMTP id |
| z22mr1114300eju.527.1602712063213; | |
| Wed, 14 Oct 2020 14:47:43 -0700 (PDT) | |
| MIME-Version: | 1.0 |
| References: | <CAFC9CLCtfMORMxAK6==jdwY5ZbX6jWwo+JCfDwM3njgvGduf0w AT mail DOT gmail DOT com> |
| <634821436 DOT 20201004141809 AT yandex DOT ru> | |
| <CAFC9CLCHk0WMj935OzZF+HeAdDbv-kGU_SHyi47vohagM+ZmtQ AT mail DOT gmail DOT com> | |
| <d4f283fe85c31be76dcfc01b20bb375e AT mail DOT kylheku DOT com> | |
| In-Reply-To: | <d4f283fe85c31be76dcfc01b20bb375e@mail.kylheku.com> |
| From: | =?UTF-8?B?SsOpcsO0bWUgRnJvaXNzYXJ0?= <software AT froissart DOT eu> |
| Date: | Wed, 14 Oct 2020 23:47:37 +0200 |
| Message-ID: | <CAFC9CLCx3nAQu6aMYTTL1syr9zyXgHYY0vKCKSCXAf=HpYXDiQ@mail.gmail.com> |
| Subject: | Re: Unconsistent command-line parsing in case of UTF-8 quoted |
| arguments | |
| To: | "Kaz Kylheku (Cygwin)" <743-406-3965 AT kylheku DOT com> |
| X-Spam-Status: | No, score=-1.6 required=5.0 tests=BAYES_00, |
| FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, | |
| KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, | |
| SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 | |
| X-Spam-Checker-Version: | SpamAssassin 3.4.2 (2018-09-13) on |
| server2.sourceware.org | |
| X-BeenThere: | cygwin AT cygwin DOT com |
| X-Mailman-Version: | 2.1.29 |
| List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
| List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
| List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
| Cc: | cygwin AT cygwin DOT com |
| Sender: | "Cygwin" <cygwin-bounces AT cygwin DOT com> |
| X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 09ELmG2e030955 |
Thank you everyone, I now have a better understanding of how Windows
and Cygwin work (being rather a Linux guy, I was not really aware of
all of this).
However, there is still a question that is puzzling me. I now
understand _why_ things happen that way, but I am still wondering
whether this is really what we _want_. I mean, keeping the double
quotes around an UTF-8 argument just because it is not run from
Cygwin's bash sounds like a bug for me, doesn't it? (yet I definitely
understand the reasons that explain this behaviour). Since I cannot
run my program from bash, I have to resort to manually trimming the
quotes, which I would have liked to avoid.
I'd like to share a message that the maintainer of sshfs-win has
posted on Github [1], which is a follow-up to our discussions (he did
not know whether he was able to post in the mailing list without
subscribing first).
(besides, I unfortunately don't have much time currently to
investigate on this issue (for instance, I have not yet succeeded in
doing the same experiments with the very latest version of Cygwin), so
having his feedback is very valuable).
Here is what he says:
> It seems to me that the list is missing the important point
> about the double quote characters that should NOT be there
> regardless of how the é and ô characters are being interpreted.
> (As evidence of this: the Cygwin command line parser was able
> to break the command line into arguments correctly, but chose
> to retain the double quotes.)
>
> The choice of GetCommandLineA was for illustration purposes;
> had I used GetCommandLineW I would not be able to printf
> using %ls under CMD.EXE, because of code page issues. However
> here is a modified version of the test program that uses
> GetCommandLineW.
>
> #include <stdio.h>
>
> wchar_t *GetCommandLineW(void);
>
> int main(int argc, char *argv[])
> {
> wchar_t *s = GetCommandLineW();
>
> for (wchar_t *p = s; *p; p++)
> printf("%04x %c%s",
> *p,
> 32 <= *p && *p < 127 ? *p : '.',
> (p - s) % 8 + 1 != 8 ? " " : "\n");
> printf("\n");
>
> for (int i = 0; argc > i; i++)
> printf("%d=%s\n", i, argv[i]);
>
> return 0;
> }
>
> I compiled this program under Cygwin to produce cyg.exe and ran
> it under Cygwin and CMD.EXE.
>
> Cygwin run:
> > billziss AT xps:~/Projects/t$ locale
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_ALL=
> billziss AT xps:~/Projects/t$ ./cyg.exe "foo bar" "Domain\Jérôme"
> 0022 " 0043 C 003a : 005c \ 0055 U 0073 s 0065 e 0072 r
> 0073 s 005c \ 0062 b 0069 i 006c l 006c l 007a z 0069 i
> 0073 s 0073 s 005c \ 0050 P 0072 r 006f o 006a j 0065 e
> 0063 c 0074 t 0073 s 005c \ 0074 t 005c \ 0063 c 0079 y
> 0067 g 002e . 0065 e 0078 x 0065 e 0022 "
> 0=./cyg
> 1=foo bar
> 2=Domain\Jérôme
>
>
>
>
>
> CMD.EXE run:
>
> C:\Users\billziss\Projects\t>\Windows\System32\chcp.com
> Active code page: 437
>
> C:\Users\billziss\Projects\t>cyg.exe "foo bar" "Domain\Jérôme"
> 0063 c 0079 y 0067 g 002e . 0065 e 0078 x 0065 e 0020
> 0020 0022 " 0066 f 006f o 006f o 0020 0062 b 0061 a
> 0072 r 0022 " 0020 0022 " 0044 D 006f o 006d m 0061 a
> 0069 i 006e n 005c \ 004a J 00e9 . 0072 r 00f4 . 006d m
> 0065 e 0022 "
> 0=cyg
> 1=foo bar
> 2="Domain\Jérôme"
[1] https://github.com/billziss-gh/sshfs-win/pull/208
Thank you very much
Jérôme
Le mar. 13 oct. 2020 Ã 18:30, Kaz Kylheku (Cygwin)
<743-406-3965 AT kylheku DOT com> a écrit :
>
> On 2020-10-06 14:36, Jérôme Froissart wrote:
> > Here is an example C file
> > $ cat example.c
> > #include <stdio.h>
> >
> > const char *GetCommandLineA(void);
> >
> > int main(int argc, char *argv[])
> > {
> > const char *s = GetCommandLineA();
> > printf("C=%s\n", s);
> >
> > for (int i = 0; argc > i; i++)
> > printf("%d=%s\n", i, argv[i]);
> >
> > return 0;
> > }
>
> Your program's comparison seems to be based on the
> hypothesis that Cygwin parses the GetCommandLineA() command line.
>
> But this hypothesis is almost certainly wrong.
>
> > Now, let's start a Windows shell (cmd.exe)
> > Note that I had to copy cygwin1.dll from my Cygwin installation
> > directory, otherwise binary.exe would not start.
> > I do not know whether there is a `locale` equivalent in Windows
> > command prompt, so I merely ran my program.
> > C:\Users\Public>binary.exe "foo bar" "Jérôme"
> > C=binary.exe "foo bar" "Jâ–¡râ–¡me"
> > 0=binary
> > 1=foo bar
> > 2="Jérôme"
>
> The "A" command line from GetCommandLineA has "tofu"
> characters: é and ô were not decoded properly.
>
> The é and ô characters we see in the Cygwin-parsed
> arguments coming into main could not have been recovered
> from these "tofu" replacement characters.
>
> What is actually being parsed must be the WCHAR command line
> corresponding to what comes from GetCommandLineW().
>
> It's necessary to show that one to get a more complete understanding.
>
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
| webmaster | delorie software privacy |
| Copyright © 2019 by DJ Delorie | Updated Jul 2019 |