delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2020/10/13/12:31:37

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C745B388C03B
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1602606644;
bh=DF3hOp7C294NJnvVFDiluJ17Gp54rGYOJm/01vjh6Pg=;
h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=e0GguJ5ZQF3utjofHOu4WwnQZRTWudqSdgc+nFm7hoBUXFoN0Ow6q6WUD3xbnEq5G
BNOzqWUW8V574+Iiom4OtNQvf9ku2bGtAlksl05/YdqJOhNPs2HthO0m/y8MTbh5pI
NVnDgBVKA5dxzIv8bWGweE7v+YG8fkv61MlTRIjE=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 495E23861817
X-Authority-Analysis: v=2.4 cv=bZHV7MDB c=1 sm=1 tr=0 ts=5f85d62f
a=95A0EdhkF1LMGt25d7h1IQ==:117 a=95A0EdhkF1LMGt25d7h1IQ==:17
a=IkcTkHD0fZMA:10 a=SMorJkV_YP8A:10 a=afefHYAZSVUA:10
a=FhMo6CzChv-EA_v4RMMA:9 a=QEXdDO2ut3YA:10
To: =?UTF-8?Q?J=C3=A9r=C3=B4me_Froissart?= <software AT froissart DOT eu>
Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted
arguments
X-PHP-Originating-Script: 501:rcmail.php
MIME-Version: 1.0
Date: Tue, 13 Oct 2020 09:30:37 -0700
In-Reply-To: <CAFC9CLCHk0WMj935OzZF+HeAdDbv-kGU_SHyi47vohagM+ZmtQ@mail.gmail.com>
References: <CAFC9CLCtfMORMxAK6==jdwY5ZbX6jWwo+JCfDwM3njgvGduf0w AT mail DOT gmail DOT com>
<634821436 DOT 20201004141809 AT yandex DOT ru>
<CAFC9CLCHk0WMj935OzZF+HeAdDbv-kGU_SHyi47vohagM+ZmtQ AT mail DOT gmail DOT com>
Message-ID: <d4f283fe85c31be76dcfc01b20bb375e@mail.kylheku.com>
X-Sender: 743-406-3965 AT kylheku DOT com
User-Agent: Roundcube Webmail/0.9.2
X-CMAE-Envelope: MS4xfAOWl4UyINI3/YdXFUdJqQucx6HZnVCgceMrq2FvqIolN1Zrcp4G10T5cEU016SrdCGm66XGIyf9prS4JznAFjoicbopesLEEwZI45+HFear+NUkuJ9P
0ddpaQNb4bzW82qIrIxz5bGCecJ4B0R5yzqs9E+ngXN7/YyUTAKNgPWrLFgjqy5Qz0wT87peNDj3lVJdkn05GVPJSqp+hQvHQFpyugbcmOlX/M+hqtWvmmXj
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,
FROM_STARTS_WITH_NUMS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,
RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: "Kaz Kylheku \(Cygwin\) via Cygwin" <cygwin AT cygwin DOT com>
Reply-To: "Kaz Kylheku \(Cygwin\)" <743-406-3965 AT kylheku DOT com>
Cc: cygwin AT cygwin DOT com
Errors-To: cygwin-bounces AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 09DGVC3X012177

On 2020-10-06 14:36, Jérôme Froissart wrote:
> Here is an example C file
>     $ cat example.c
>     #include <stdio.h>
> 
>     const char *GetCommandLineA(void);
> 
>     int main(int argc, char *argv[])
>     {
>         const char *s = GetCommandLineA();
>         printf("C=%s\n", s);
> 
>         for (int i = 0; argc > i; i++)
>             printf("%d=%s\n", i, argv[i]);
> 
>         return 0;
>     }

Your program's comparison seems to be based on the
hypothesis that Cygwin parses the GetCommandLineA() command line.

But this hypothesis is almost certainly wrong.

> Now, let's start a Windows shell (cmd.exe)
> Note that I had to copy cygwin1.dll from my Cygwin installation
> directory, otherwise binary.exe would not start.
> I do not know whether there is a `locale` equivalent in Windows
> command prompt, so I merely ran my program.
>     C:\Users\Public>binary.exe "foo bar" "Jérôme"
>     C=binary.exe  "foo bar" "Jâ–¡râ–¡me"
>     0=binary
>     1=foo bar
>     2="Jérôme"

The "A" command line from GetCommandLineA has "tofu"
characters: é and ô were not decoded properly.

The é and ô characters we see in the Cygwin-parsed
arguments coming into main could not have been recovered
from these "tofu" replacement characters.

What is actually being parsed must be the WCHAR command line
corresponding to what comes from GetCommandLineW().

It's necessary to show that one to get a more complete understanding.

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019