delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2020/08/03/12:32:07

X-Recipient: archive-cygwin AT delorie DOT com
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 811003857C53
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
header.from=SystematicSw.ab.ca
Authentication-Results: sourceware.org;
spf=none smtp.mailfrom=brian DOT inglis AT systematicsw DOT ab DOT ca
X-Authority-Analysis: v=2.3 cv=ePaIcEh1 c=1 sm=1 tr=0
a=kiZT5GMN3KAWqtYcXc+/4Q==:117 a=kiZT5GMN3KAWqtYcXc+/4Q==:17
a=IkcTkHD0fZMA:10 a=r-inJIJVAAAA:8 a=Ed7FdIT4gc43trk-okQA:9 a=QEXdDO2ut3YA:10
a=BQhyvZF-XxUuHVZtuGPo:22
Subject: Re: Trouble with character sets
To: cygwin AT cygwin DOT com
References: <OF3F4D2646 DOT 3A75682C-ON852585B5 DOT 0058983D-852585B9 DOT 0055B758 AT abinitio DOT com>
From: Brian Inglis <Brian DOT Inglis AT SystematicSw DOT ab DOT ca>
Autocrypt: addr=Brian DOT Inglis AT SystematicSw DOT ab DOT ca; prefer-encrypt=mutual;
keydata=
mDMEXopx8xYJKwYBBAHaRw8BAQdAnCK0qv/xwUCCZQoA9BHRYpstERrspfT0NkUWQVuoePa0
LkJyaWFuIEluZ2xpcyA8QnJpYW4uSW5nbGlzQFN5c3RlbWF0aWNTdy5hYi5jYT6IlgQTFggA
PhYhBMM5/lbU970GBS2bZB62lxu92I8YBQJeinHzAhsDBQkJZgGABQsJCAcCBhUKCQgLAgQW
AgMBAh4BAheAAAoJEB62lxu92I8Y0ioBAI8xrggNxziAVmr+Xm6nnyjoujMqWcq3oEhlYGAO
WacZAQDFtdDx2koSVSoOmfaOyRTbIWSf9/Cjai29060fsmdsDLg4BF6KcfMSCisGAQQBl1UB
BQEBB0Awv8kHI2PaEgViDqzbnoe8B9KMHoBZLS92HdC7ZPh8HQMBCAeIfgQYFggAJhYhBMM5
/lbU970GBS2bZB62lxu92I8YBQJeinHzAhsMBQkJZgGAAAoJEB62lxu92I8YZwUBAJw/74rF
IyaSsGI7ewCdCy88Lce/kdwX7zGwid+f8NZ3AQC/ezTFFi5obXnyMxZJN464nPXiggtT9gN5
RSyTY8X+AQ==
Organization: Systematic Software
Message-ID: <ae1f8133-948a-4497-049b-b8349a138143@SystematicSw.ab.ca>
Date: Mon, 3 Aug 2020 10:31:15 -0600
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
Thunderbird/68.11.0
MIME-Version: 1.0
In-Reply-To: <OF3F4D2646.3A75682C-ON852585B5.0058983D-852585B9.0055B758@abinitio.com>
X-CMAE-Envelope: MS4wfO+axsxNBVlybyjrnnrDskMNIRDHvxrY5HB8+ei3jsETE2OSypvqxKNy7yP0gFU6xrLvJnTplOODJN3p/OSvKMglKaLOKMbw106NtcRTsgyX9pXF3vad
q5VKICnn2tICCSHqHWrt43K/qQiX/m29tmWcqqu8hPbiQj5Q/4M15SZPfIKhS21CsRFB00jacdNL1Q==
X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,
SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Reply-To: cygwin AT cygwin DOT com
Errors-To: cygwin-bounces AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 073GVnLg032344

On 2020-08-03 09:36, Michael Shay via Cygwin wrote:
> I'm having a problem with Cygwin 3.1.4, changing the character set on the 
> fly. It seems to work with Cygwin applications, but not with Win32 
> applications.
> I have a Korn shell script:
> #!/bin/ksh
> OLD_LANG="$LANG"
> OLD_LC_ALL="$LC_ALL"
> echo "locale on entry"
> locale
> echo ""
> export LANG="en_US.CP1252"
> export LC_ALL=en_US.CP1252
> echo "locale changed to"
> locale
> echo ""
> # Default is to run the Win32 program. Input any argument other than 
> 'WIN32'
> # to run '/bin/echo'.
> case $# in
>    0 )  echo "Running WIN32 pgm"
>         ksh -c 'cygtest.exe ZÇ'
>         ;;
>    1 )  echo "Running Cygwin 'echo'"
>         ksh -c '/bin/echo ZÇ'
>         ;;
>    2 )  echo "Running WIN32 pgm"
>         ksh -c 'cygtest.exe ZÇ'
>         echo ""
>         echo "Running Cygwin 'echo'"
>         ksh -c '/bin/echo ZÇ'
>         ;;
>    * ) ;;
> esac
> LC_ALL="$OLD_LC_ALL"
> LANG="$OLD_LANG"
> and a Win32 application (attached file cygtest.cpp)
> I used gdb to see what was happening in child_info_spawn::worker(), when a 
> Win32 program is started using:
>           rc = CreateProcessW (runpath,   /* image name w/ full path */
>                    cmd.wcs (wcmd),  /* what was passed to exec */
>                    sa,    /* process security attrs */
>                    sa,    /* thread security attrs */
>                    TRUE,    /* inherit handles */
>                    c_flags,
>                    envblock,  /* environment */
>                    NULL,
>                    &si,
>                    &pi);
> Specifically, 'cmd.wcs(wcmd)' invokes:
>   wchar_t *wcs (wchar_t *wbuf, size_t n)
>   {
>     if (n == 1)
>       wbuf[0] = L'\0';
>     else
>         sys_mbstowcs (wbuf, n, buf);
>     return wbuf;
>   }
> and sys_mbstowcs():
> size_t __reg3
> sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
> {
>   mbtowc_p f_mbtowc = __MBTOWC;
>   if (f_mbtowc == __ascii_mbtowc)
>     {
>       f_mbtowc = __utf8_mbtowc;                                 <<<<< this 
> is ALWAYS done, no matter what charset is in use.
>     }
>   return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
> }
> Since the CP1252 is an 8-bit single-byte character set with characters >= 
> 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the 
> '0xf0' byte indicating an invalid character in the string.
> This doesn't seem to happen when e.g. '/bin/echo' is run, although I 
> haven't stepped into the code to see what's happening.
> I do not think this is a Cygwin bug, but since the User's Guide says the 
> locale and charset can be changed on the fly, I don't know what's going 
> awry.
> Any suggestions? If you need more information, I'm happy to provide it.

Try:

$ chcp.com
Active code page: 850
$ chcp.com 65001
Active code page: 65001
$ chcp.com
Active code page: 65001

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019