X-Recipient: archive-cygwin AT delorie DOT com X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 811003857C53 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=SystematicSw.ab.ca Authentication-Results: sourceware.org; spf=none smtp.mailfrom=brian DOT inglis AT systematicsw DOT ab DOT ca X-Authority-Analysis: v=2.3 cv=ePaIcEh1 c=1 sm=1 tr=0 a=kiZT5GMN3KAWqtYcXc+/4Q==:117 a=kiZT5GMN3KAWqtYcXc+/4Q==:17 a=IkcTkHD0fZMA:10 a=r-inJIJVAAAA:8 a=Ed7FdIT4gc43trk-okQA:9 a=QEXdDO2ut3YA:10 a=BQhyvZF-XxUuHVZtuGPo:22 Subject: Re: Trouble with character sets To: cygwin AT cygwin DOT com References: From: Brian Inglis Autocrypt: addr=Brian DOT Inglis AT SystematicSw DOT ab DOT ca; prefer-encrypt=mutual; keydata= mDMEXopx8xYJKwYBBAHaRw8BAQdAnCK0qv/xwUCCZQoA9BHRYpstERrspfT0NkUWQVuoePa0 LkJyaWFuIEluZ2xpcyA8QnJpYW4uSW5nbGlzQFN5c3RlbWF0aWNTdy5hYi5jYT6IlgQTFggA PhYhBMM5/lbU970GBS2bZB62lxu92I8YBQJeinHzAhsDBQkJZgGABQsJCAcCBhUKCQgLAgQW AgMBAh4BAheAAAoJEB62lxu92I8Y0ioBAI8xrggNxziAVmr+Xm6nnyjoujMqWcq3oEhlYGAO WacZAQDFtdDx2koSVSoOmfaOyRTbIWSf9/Cjai29060fsmdsDLg4BF6KcfMSCisGAQQBl1UB BQEBB0Awv8kHI2PaEgViDqzbnoe8B9KMHoBZLS92HdC7ZPh8HQMBCAeIfgQYFggAJhYhBMM5 /lbU970GBS2bZB62lxu92I8YBQJeinHzAhsMBQkJZgGAAAoJEB62lxu92I8YZwUBAJw/74rF IyaSsGI7ewCdCy88Lce/kdwX7zGwid+f8NZ3AQC/ezTFFi5obXnyMxZJN464nPXiggtT9gN5 RSyTY8X+AQ== Organization: Systematic Software Message-ID: Date: Mon, 3 Aug 2020 10:31:15 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-CA X-CMAE-Envelope: MS4wfO+axsxNBVlybyjrnnrDskMNIRDHvxrY5HB8+ei3jsETE2OSypvqxKNy7yP0gFU6xrLvJnTplOODJN3p/OSvKMglKaLOKMbw106NtcRTsgyX9pXF3vad q5VKICnn2tICCSHqHWrt43K/qQiX/m29tmWcqqu8hPbiQj5Q/4M15SZPfIKhS21CsRFB00jacdNL1Q== X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: cygwin AT cygwin DOT com Content-Type: text/plain; charset="utf-8" Errors-To: cygwin-bounces AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 073GVnLg032344 On 2020-08-03 09:36, Michael Shay via Cygwin wrote: > I'm having a problem with Cygwin 3.1.4, changing the character set on the > fly. It seems to work with Cygwin applications, but not with Win32 > applications. > I have a Korn shell script: > #!/bin/ksh > OLD_LANG="$LANG" > OLD_LC_ALL="$LC_ALL" > echo "locale on entry" > locale > echo "" > export LANG="en_US.CP1252" > export LC_ALL=en_US.CP1252 > echo "locale changed to" > locale > echo "" > # Default is to run the Win32 program. Input any argument other than > 'WIN32' > # to run '/bin/echo'. > case $# in > 0 ) echo "Running WIN32 pgm" > ksh -c 'cygtest.exe ZÇ' > ;; > 1 ) echo "Running Cygwin 'echo'" > ksh -c '/bin/echo ZÇ' > ;; > 2 ) echo "Running WIN32 pgm" > ksh -c 'cygtest.exe ZÇ' > echo "" > echo "Running Cygwin 'echo'" > ksh -c '/bin/echo ZÇ' > ;; > * ) ;; > esac > LC_ALL="$OLD_LC_ALL" > LANG="$OLD_LANG" > and a Win32 application (attached file cygtest.cpp) > I used gdb to see what was happening in child_info_spawn::worker(), when a > Win32 program is started using: > rc = CreateProcessW (runpath, /* image name w/ full path */ > cmd.wcs (wcmd), /* what was passed to exec */ > sa, /* process security attrs */ > sa, /* thread security attrs */ > TRUE, /* inherit handles */ > c_flags, > envblock, /* environment */ > NULL, > &si, > &pi); > Specifically, 'cmd.wcs(wcmd)' invokes: > wchar_t *wcs (wchar_t *wbuf, size_t n) > { > if (n == 1) > wbuf[0] = L'\0'; > else > sys_mbstowcs (wbuf, n, buf); > return wbuf; > } > and sys_mbstowcs(): > size_t __reg3 > sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms) > { > mbtowc_p f_mbtowc = __MBTOWC; > if (f_mbtowc == __ascii_mbtowc) > { > f_mbtowc = __utf8_mbtowc; <<<<< this > is ALWAYS done, no matter what charset is in use. > } > return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms); > } > Since the CP1252 is an 8-bit single-byte character set with characters >= > 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the > '0xf0' byte indicating an invalid character in the string. > This doesn't seem to happen when e.g. '/bin/echo' is run, although I > haven't stepped into the code to see what's happening. > I do not think this is a Cygwin bug, but since the User's Guide says the > locale and charset can be changed on the fly, I don't know what's going > awry. > Any suggestions? If you need more information, I'm happy to provide it. Try: $ chcp.com Active code page: 850 $ chcp.com 65001 Active code page: 65001 $ chcp.com Active code page: 65001 -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in IEC units and prefixes, physical quantities in SI.] -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple