delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2020/08/03/13:11:40

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 60189386180A
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1596474654;
bh=YsXw2HkyT4QmLj4VA5DktR3bEiELg1gomSaiE5Znoi8=;
h=In-Reply-To:To:Subject:Date:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=fPO0CLHAo6ZPaNeo3hYSqVeBiWvxQQijElttoaUyIoXbBH7ABknOZCtnp0dP1cgm6
U0GnXoEwMBhJygB+Kkni4fIEJ5vlmS0YcWwWC2Wrw82rdFzWL0HqzbY5dxPJmfOegz
F1g6i7AKQRYWkHFsR7P1T0NXobaZlK2DVKhk3KFk=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DA9753858D35
In-Reply-To: <ae1f8133-948a-4497-049b-b8349a138143@SystematicSw.ab.ca>
To: cygwin AT cygwin DOT com
Subject: Re: Trouble with character sets
Message-ID: <OF28060D19.DB6E392B-ON852585B9.005D898D-852585B9.005E6021@abinitio.com>
Date: Mon, 3 Aug 2020 13:10:49 -0400
References: <OF3F4D2646 DOT 3A75682C-ON852585B5 DOT 0058983D-852585B9 DOT 0055B758 AT abinitio DOT com>
<ae1f8133-948a-4497-049b-b8349a138143 AT SystematicSw DOT ab DOT ca>
MIME-Version: 1.0
X-KeepSent: 28060D19:DB6E392B-852585B9:005D898D; name=$KeepSent; type=4
X-Disclaimed: 46895
X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00, HTML_MESSAGE,
KAM_DMARC_STATUS, LOTS_OF_MONEY, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS,
TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Michael Shay via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Michael Shay <MShay AT ABINITIO DOT COM>
Errors-To: cygwin-bounces AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id 073HBMqj006370

Doesn't help. I tried 65001 (UTF-8):

### SET CP TO UTF-8, 65001
$cygwin_charset_test.ksh
Old CP 65001
locale on entry
LANG=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=

### CP SET TO 65001
Active code page: 65001
locale changed to
LANG=en_US.CP1252
LC_CTYPE="en_US.CP1252"
LC_NUMERIC="en_US.CP1252"
LC_TIME="en_US.CP1252"
LC_COLLATE="en_US.CP1252"
LC_MONETARY="en_US.CP1252"
LC_MESSAGES="en_US.CP1252"
LC_ALL=en_US.CP1252

Running WIN32 pgm
Transcoding using Cygwin codepage: 1252
Input widechar string:
        lpw[0] = Z - 5A
        lpw[1] =  - F0C7
wmain: Z?
Active code page: 65001

and 1252

### SET CP TO 1252
$cygwin_charset_test.ksh
Old CP 65001
locale on entry
LANG=
LC_CTYPE="C.UTF-8"
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_ALL=

### CP SET TO 1252
Active code page: 1252
locale changed to
LANG=en_US.CP1252
LC_CTYPE="en_US.CP1252"
LC_NUMERIC="en_US.CP1252"
LC_TIME="en_US.CP1252"
LC_COLLATE="en_US.CP1252"
LC_MONETARY="en_US.CP1252"
LC_MESSAGES="en_US.CP1252"
LC_ALL=en_US.CP1252

Running WIN32 pgm
Transcoding using Cygwin codepage: 1252
Input widechar string:
        lpw[0] = Z - 5A
        lpw[1] =  - F0C7
wmain: Z?
Active code page: 65001




Michael



From:   "Brian Inglis" <Brian DOT Inglis AT SystematicSw DOT ab DOT ca>
To:     cygwin AT cygwin DOT com
Date:   08/03/2020 12:31 PM
Subject:        Re: Trouble with character sets
Sent by:        "Cygwin" <cygwin-bounces AT cygwin DOT com>



On 2020-08-03 09:36, Michael Shay via Cygwin wrote:
> I'm having a problem with Cygwin 3.1.4, changing the character set on 
the 
> fly. It seems to work with Cygwin applications, but not with Win32 
> applications.
> I have a Korn shell script:
> #!/bin/ksh
> OLD_LANG="$LANG"
> OLD_LC_ALL="$LC_ALL"
> echo "locale on entry"
> locale
> echo ""
> export LANG="en_US.CP1252"
> export LC_ALL=en_US.CP1252
> echo "locale changed to"
> locale
> echo ""
> # Default is to run the Win32 program. Input any argument other than 
> 'WIN32'
> # to run '/bin/echo'.
> case $# in
>    0 )  echo "Running WIN32 pgm"
>         ksh -c 'cygtest.exe ZĒ'
>         ;;
>    1 )  echo "Running Cygwin 'echo'"
>         ksh -c '/bin/echo ZĒ'
>         ;;
>    2 )  echo "Running WIN32 pgm"
>         ksh -c 'cygtest.exe ZĒ'
>         echo ""
>         echo "Running Cygwin 'echo'"
>         ksh -c '/bin/echo ZĒ'
>         ;;
>    * ) ;;
> esac
> LC_ALL="$OLD_LC_ALL"
> LANG="$OLD_LANG"
> and a Win32 application (attached file cygtest.cpp)
> I used gdb to see what was happening in child_info_spawn::worker(), when 
a 
> Win32 program is started using:
>           rc = CreateProcessW (runpath,   /* image name w/ full path */
>                    cmd.wcs (wcmd),  /* what was passed to exec */
>                    sa,    /* process security attrs */
>                    sa,    /* thread security attrs */
>                    TRUE,    /* inherit handles */
>                    c_flags,
>                    envblock,  /* environment */
>                    NULL,
>                    &si,
>                    &pi);
> Specifically, 'cmd.wcs(wcmd)' invokes:
>   wchar_t *wcs (wchar_t *wbuf, size_t n)
>   {
>     if (n == 1)
>       wbuf[0] = L'\0';
>     else
>         sys_mbstowcs (wbuf, n, buf);
>     return wbuf;
>   }
> and sys_mbstowcs():
> size_t __reg3
> sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
> {
>   mbtowc_p f_mbtowc = __MBTOWC;
>   if (f_mbtowc == __ascii_mbtowc)
>     {
>       f_mbtowc = __utf8_mbtowc;                                 <<<<< 
this 
> is ALWAYS done, no matter what charset is in use.
>     }
>   return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
> }
> Since the CP1252 is an 8-bit single-byte character set with characters 
>= 
> 0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the 

> '0xf0' byte indicating an invalid character in the string.
> This doesn't seem to happen when e.g. '/bin/echo' is run, although I 
> haven't stepped into the code to see what's happening.
> I do not think this is a Cygwin bug, but since the User's Guide says the 

> locale and charset can be changed on the fly, I don't know what's going 
> awry.
> Any suggestions? If you need more information, I'm happy to provide it.

Try:

$ chcp.com
Active code page: 850
$ chcp.com 65001
Active code page: 65001
$ chcp.com
Active code page: 65001

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple




  
NOTICE  from Ab Initio: This email (including any attachments) may contain information that is subject to confidentiality obligations or is legally privileged, and sender does not waive confidentiality or privilege. If received in error, please notify the sender, delete this email, and make no further use, disclosure, or distribution.  
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019