X-Recipient: archive-cygwin@delorie.com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C745B388C03B
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
	s=default; t=1602606644;
	bh=DF3hOp7C294NJnvVFDiluJ17Gp54rGYOJm/01vjh6Pg=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=e0GguJ5ZQF3utjofHOu4WwnQZRTWudqSdgc+nFm7hoBUXFoN0Ow6q6WUD3xbnEq5G
	 BNOzqWUW8V574+Iiom4OtNQvf9ku2bGtAlksl05/YdqJOhNPs2HthO0m/y8MTbh5pI
	 NVnDgBVKA5dxzIv8bWGweE7v+YG8fkv61MlTRIjE=
X-Original-To: cygwin@cygwin.com
Delivered-To: cygwin@cygwin.com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 495E23861817
X-Authority-Analysis: v=2.4 cv=bZHV7MDB c=1 sm=1 tr=0 ts=5f85d62f
 a=95A0EdhkF1LMGt25d7h1IQ==:117 a=95A0EdhkF1LMGt25d7h1IQ==:17
 a=IkcTkHD0fZMA:10 a=SMorJkV_YP8A:10 a=afefHYAZSVUA:10
 a=FhMo6CzChv-EA_v4RMMA:9 a=QEXdDO2ut3YA:10
To: =?UTF-8?Q?J=C3=A9r=C3=B4me_Froissart?= <software@froissart.eu>
Subject: Re: Unconsistent command-line parsing in case of UTF-8 quoted
 arguments
X-PHP-Originating-Script: 501:rcmail.php
MIME-Version: 1.0
Date: Tue, 13 Oct 2020 09:30:37 -0700
In-Reply-To: <CAFC9CLCHk0WMj935OzZF+HeAdDbv-kGU_SHyi47vohagM+ZmtQ@mail.gmail.com>
References: <CAFC9CLCtfMORMxAK6==jdwY5ZbX6jWwo+JCfDwM3njgvGduf0w@mail.gmail.com>
 <634821436.20201004141809@yandex.ru>
 <CAFC9CLCHk0WMj935OzZF+HeAdDbv-kGU_SHyi47vohagM+ZmtQ@mail.gmail.com>
Message-ID: <d4f283fe85c31be76dcfc01b20bb375e@mail.kylheku.com>
X-Sender: 743-406-3965@kylheku.com
User-Agent: Roundcube Webmail/0.9.2
X-CMAE-Envelope: MS4xfAOWl4UyINI3/YdXFUdJqQucx6HZnVCgceMrq2FvqIolN1Zrcp4G10T5cEU016SrdCGm66XGIyf9prS4JznAFjoicbopesLEEwZI45+HFear+NUkuJ9P
 0ddpaQNb4bzW82qIrIxz5bGCecJ4B0R5yzqs9E+ngXN7/YyUTAKNgPWrLFgjqy5Qz0wT87peNDj3lVJdkn05GVPJSqp+hQvHQFpyugbcmOlX/M+hqtWvmmXj
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,
 FROM_STARTS_WITH_NUMS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3,
 RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: cygwin@cygwin.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
 <mailto:cygwin-request@cygwin.com?subject=subscribe>
From: "Kaz Kylheku \(Cygwin\) via Cygwin" <cygwin@cygwin.com>
Reply-To: "Kaz Kylheku \(Cygwin\)" <743-406-3965@kylheku.com>
Cc: cygwin@cygwin.com
Content-Type: text/plain; charset="utf-8"; Format="flowed"
Errors-To: cygwin-bounces@cygwin.com
Sender: "Cygwin" <cygwin-bounces@cygwin.com>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 09DGVC3X012177

On 2020-10-06 14:36, Jérôme Froissart wrote:
> Here is an example C file
>     $ cat example.c
>     #include <stdio.h>
> 
>     const char *GetCommandLineA(void);
> 
>     int main(int argc, char *argv[])
>     {
>         const char *s = GetCommandLineA();
>         printf("C=%s\n", s);
> 
>         for (int i = 0; argc > i; i++)
>             printf("%d=%s\n", i, argv[i]);
> 
>         return 0;
>     }

Your program's comparison seems to be based on the
hypothesis that Cygwin parses the GetCommandLineA() command line.

But this hypothesis is almost certainly wrong.

> Now, let's start a Windows shell (cmd.exe)
> Note that I had to copy cygwin1.dll from my Cygwin installation
> directory, otherwise binary.exe would not start.
> I do not know whether there is a `locale` equivalent in Windows
> command prompt, so I merely ran my program.
>     C:\Users\Public>binary.exe "foo bar" "Jérôme"
>     C=binary.exe  "foo bar" "J□r□me"
>     0=binary
>     1=foo bar
>     2="Jérôme"

The "A" command line from GetCommandLineA has "tofu"
characters: é and ô were not decoded properly.

The é and ô characters we see in the Cygwin-parsed
arguments coming into main could not have been recovered
from these "tofu" replacement characters.

What is actually being parsed must be the WCHAR command line
corresponding to what comes from GetCommandLineW().

It's necessary to show that one to get a more complete understanding.

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

