delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2020/10/15/01:15:44

X-Recipient: archive-cygwin AT delorie DOT com
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 714FA385780B
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
header.from=SystematicSw.ab.ca
Authentication-Results: sourceware.org;
spf=none smtp.mailfrom=brian DOT inglis AT systematicsw DOT ab DOT ca
X-Authority-Analysis: v=2.4 cv=INe8tijG c=1 sm=1 tr=0 ts=5f87dac7
a=kiZT5GMN3KAWqtYcXc+/4Q==:117 a=kiZT5GMN3KAWqtYcXc+/4Q==:17
a=IkcTkHD0fZMA:10 a=TImcKGuyeGIbufSLrCcA:9 a=QEXdDO2ut3YA:10
Subject: Re: UTF-8 quoted args passed to program include quotes when run from
cmd
To: cygwin AT cygwin DOT com
References: <CAFC9CLCtfMORMxAK6==jdwY5ZbX6jWwo+JCfDwM3njgvGduf0w AT mail DOT gmail DOT com>
<634821436 DOT 20201004141809 AT yandex DOT ru>
<CAFC9CLCHk0WMj935OzZF+HeAdDbv-kGU_SHyi47vohagM+ZmtQ AT mail DOT gmail DOT com>
<d4f283fe85c31be76dcfc01b20bb375e AT mail DOT kylheku DOT com>
<CAFC9CLCx3nAQu6aMYTTL1syr9zyXgHYY0vKCKSCXAf=HpYXDiQ AT mail DOT gmail DOT com>
From: Brian Inglis <Brian DOT Inglis AT SystematicSw DOT ab DOT ca>
Autocrypt: addr=Brian DOT Inglis AT SystematicSw DOT ab DOT ca; prefer-encrypt=mutual;
keydata=
mDMEXopx8xYJKwYBBAHaRw8BAQdAnCK0qv/xwUCCZQoA9BHRYpstERrspfT0NkUWQVuoePa0
LkJyaWFuIEluZ2xpcyA8QnJpYW4uSW5nbGlzQFN5c3RlbWF0aWNTdy5hYi5jYT6IlgQTFggA
PhYhBMM5/lbU970GBS2bZB62lxu92I8YBQJeinHzAhsDBQkJZgGABQsJCAcCBhUKCQgLAgQW
AgMBAh4BAheAAAoJEB62lxu92I8Y0ioBAI8xrggNxziAVmr+Xm6nnyjoujMqWcq3oEhlYGAO
WacZAQDFtdDx2koSVSoOmfaOyRTbIWSf9/Cjai29060fsmdsDLg4BF6KcfMSCisGAQQBl1UB
BQEBB0Awv8kHI2PaEgViDqzbnoe8B9KMHoBZLS92HdC7ZPh8HQMBCAeIfgQYFggAJhYhBMM5
/lbU970GBS2bZB62lxu92I8YBQJeinHzAhsMBQkJZgGAAAoJEB62lxu92I8YZwUBAJw/74rF
IyaSsGI7ewCdCy88Lce/kdwX7zGwid+f8NZ3AQC/ezTFFi5obXnyMxZJN464nPXiggtT9gN5
RSyTY8X+AQ==
Organization: Systematic Software
Message-ID: <05455675-7d66-fb00-9973-0a53c33ee796@SystematicSw.ab.ca>
Date: Wed, 14 Oct 2020 23:14:45 -0600
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
Thunderbird/68.12.1
MIME-Version: 1.0
In-Reply-To: <CAFC9CLCx3nAQu6aMYTTL1syr9zyXgHYY0vKCKSCXAf=HpYXDiQ@mail.gmail.com>
X-CMAE-Envelope: MS4xfNY0nzm8RRM1ByXVQmYNJoIMcw3fNT8XoYGXlSExNQh9V+Uz0kilq6r6qW0Lgwb4LFOflUmR9mRDYs+D/B55/ZUxLxnv2yfU7jYCcAkF9gFQcLUH+sKk
coU8QaODuZ+zAqeN5mqF5ohm2TuShqO+uiLQv3Va795TfsqFK2XPQ1MmqdYAg/amnnz6UwPOaX2qO2NTtY2qMStZ2MVwbpwmn/s=
X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4,
RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE,
TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Reply-To: cygwin AT cygwin DOT com
Errors-To: cygwin-bounces AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 09F5FQ79020963

[changed subject]

On 2020-10-14 15:47, Jérôme Froissart wrote:
>> (As evidence of this: the Cygwin command line parser was able to break the
>> command line into arguments correctly, but chose to retain the double
>> quotes.)
>>
>>     #include <stdio.h>
>>
>>     int main(int argc, char *argv[])
>>     {
>>         for (int i = 0; argc > i; i++)
>>             printf("%d=%s\n", i, argv[i]);
>>
>>         return 0;
>>     }
>>
>> I compiled this program under Cygwin to produce cyg.exe and ran it under
>> Cygwin and CMD.EXE.

Please post compile and link command lines, as Cygwin can create native Windows
as well as its own Unix like executables, and the command line parsing may vary.

>> Cygwin run:
>>> billziss AT xps:~/Projects/t$ locale
>> LANG=en_US.UTF-8
>> LC_CTYPE="en_US.UTF-8"
>> LC_NUMERIC="en_US.UTF-8"
>> LC_TIME="en_US.UTF-8"
>> LC_COLLATE="en_US.UTF-8"
>> LC_MONETARY="en_US.UTF-8"
>> LC_MESSAGES="en_US.UTF-8"
>> LC_ALL=
>> billziss AT xps:~/Projects/t$ ./cyg.exe "foo bar" "Domain\Jérôme"
>> 0=./cyg
>> 1=foo bar
>> 2=Domain\Jérôme

>> CMD.EXE run:
>> C:\Users\billziss\Projects\t>cyg.exe "foo bar" "Domain\Jérôme"
>> 0=cyg
>> 1=foo bar
>> 2="Domain\Jérôme"

>>> Now, let's start a Windows shell (cmd.exe)
>>> Note that I had to copy cygwin1.dll from my Cygwin installation
>>> directory, otherwise binary.exe would not start.
>>> I do not know whether there is a `locale` equivalent in Windows
>>> command prompt, so I merely ran my program.
>>>     C:\Users\Public>binary.exe "foo bar" "Jérôme"
>>>     0=binary
>>>     1=foo bar
>>>     2="Jérôme"

Your Windows CommandLineA/W outputs were confusing.

The point is that Cygwin programs run from cmd shell appear to receive UTF-8
arguments with the surrounding double quotes included intact, whereas the double
quotes are stripped when run from a Cygwin shell.

I think the charset needs verified by dumping each arg as hex bytes e.g.

//!/usr/bin/gcc -g -Og -Wall -Wextra -o quoted-arg-dump quoted-arg-dump.c
// quoted-arg-dump.c - dump quoted args under Cygwin and Windows shells
// outputs:
// $ ./quoted-arg-dump "foo bar" "Jérôme"
// 0 './quoted-arg-dump' 2e 2f 71 75 6f 74 65 64 2d 61 72 67 2d 64 75 6d 70
// 1 'foo bar' 66 6f 6f 20 62 61 72
// 2 'Jérôme' 4a c3 a9 72 c3 b4 6d 65
// >quoted-arg-dump "foo bar" "Jérôme"
// 0 'quoted-arg-dump' 71 75 6f 74 65 64 2d 61 72 67 2d 64 75 6d 70
// 1 'foo bar' 66 6f 6f 20 62 61 72
// 2 '"Jérôme"' 22 4a c3 a9 72 c3 b4 6d 65 22
// checks:
// $ grep -a '[éô]' unicode-symbols.txt
// é  U+00E9  LATIN SMALL LETTER E WITH ACUTE
// ô  U+00F4  LATIN SMALL LETTER O WITH CIRCUMFLEX
// $ grep -a '[éô]' unicode-symbols.txt | od -An -tx1z -w11
// c3 a9 20 20 55 2b 30 30 45 39 20  >..  U+00E9 <
// 20 4c 41 54 49 4e 20 53 4d 41 4c  > LATIN SMAL<
// 4c 20 4c 45 54 54 45 52 20 45 20  >L LETTER E <
// 57 49 54 48 20 41 43 55 54 45 0a  >WITH ACUTE.<
// c3 b4 20 20 55 2b 30 30 46 34 20  >..  U+00F4 <
// 20 4c 41 54 49 4e 20 53 4d 41 4c  > LATIN SMAL<
// 4c 20 4c 45 54 54 45 52 20 4f 20  >L LETTER O <
// 57 49 54 48 20 43 49 52 43 55 4d  >WITH CIRCUM<
// 46 4c 45 58 0a                    >FLEX.<
#include <stdio.h>
int
main(int argc, char *argv[]) {
	for (int a = 0; a < argc; ++a) {
		printf("%d '%s'", a, argv[a]);

		for (char *p = argv[a]; *p; ++p) {
			printf(" %.2hhx", *p);
		} // for chars

		printf("\n");
	} // for args
} // main()

This verifies that Cygwin does not strip double quotes from UTF-8 args when run
from Windows cmd, and the args are received and output as UTF-8 characters.

It might be interesting if you could also run from PowerShell and/or Terminal
for comparison to see if the Windows cmd behaviour is reproduced there.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in binary units and prefixes, physical quantities in SI.]
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019