delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2020/08/03/18:06:06

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0D4703857C53
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1596492323;
bh=XTo3jWPNMsQcvH0C6q9ilH7AOZ/wcUFwe8GRgEzxQrQ=;
h=In-Reply-To:To:Subject:Date:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=nfoOKTx0Jvj6WOz1gjrCglXqcT2Ac/tTM7cGkmVHCAsyN3p8vn5e+oZi0m4GNNo/0
Yligm2VVPju/Kg8PUvk7mPBNyat6kBGi7qKEmvbic8w+tFBRFvn1OO2cil3R8Pzm/y
nPMwyJwOaanQGaLZIQ/4qvxVeeSiuK/tFZYHG0EY=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 946E03857C42
In-Reply-To: <d8133245-02f0-71a7-e409-bf3b82fc7756@SystematicSw.ab.ca>
To: cygwin AT cygwin DOT com
Subject: Re: Trouble with output character sets from Win32 applications running
under mintty
Message-ID: <OFE0AAB507.AC9FD3B4-ON852585B9.0076DEA7-852585B9.007955FA@abinitio.com>
Date: Mon, 3 Aug 2020 18:05:18 -0400
References: <OF3F4D2646 DOT 3A75682C-ON852585B5 DOT 0058983D-852585B9 DOT 0055B758 AT abinitio DOT com>
<ae1f8133-948a-4497-049b-b8349a138143 AT SystematicSw DOT ab DOT ca>
<OF28060D19 DOT DB6E392B-ON852585B9 DOT 005D898D-852585B9 DOT 005E6021 AT abinitio DOT com>
<1314865780 DOT 20200803204249 AT yandex DOT ru>
<d8133245-02f0-71a7-e409-bf3b82fc7756 AT SystematicSw DOT ab DOT ca>
MIME-Version: 1.0
X-KeepSent: E0AAB507:AC9FD3B4-852585B9:0076DEA7; name=$KeepSent; type=4
X-Disclaimed: 25291
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, HTML_MESSAGE,
KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS,
TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Michael Shay via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Michael Shay <MShay AT ABINITIO DOT COM>
Errors-To: cygwin-bounces AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id 073M5m2v015492

Michael



From:   "Brian Inglis" <Brian DOT Inglis AT SystematicSw DOT ab DOT ca>
To:     cygwin AT cygwin DOT com
Date:   08/03/2020 05:23 PM
Subject:        Re: Trouble with output character sets from Win32 
applications running under mintty
Sent by:        "Cygwin" <cygwin-bounces AT cygwin DOT com>



On 2020-08-03 11:42, Andrey Repin wrote:
>> Doesn't help. I tried 65001 (UTF-8):
> 
> Because you're confusing things.
> chcp has nothing to do with LANG or LC_*.
> Et vice versa.
> 
> chcp sets console code page for native console applications. Only for 
those
> supporting it. Many do not.
> LANG sets output parameters for Cygwin applications (and other programs 
that
> look for it, but these are few).

You cut the significant statement at the top of the OP:

>> I'm having a problem with Cygwin 3.1.4, changing the character set on 
the 
>> fly. It seems to work with Cygwin applications, but not with Win32 
>> applications.

He has problems with invalid characters only running win32 console 
applications:
I changed the subject to hopefully better reflect the issue.

I am unsure where Cygwin 3.1.4 comes into Win32 applications - you have to 
use
the Windows codepage conversion routines.

You can only change input character sets on the fly; output character sets 
will
depend on mintty support of xterm-compatible character set support and 
switching
escape sequences; if you set up UCS16LE console output, Windows and mintty
should handle it.

Perhaps a better description of your environment, build tools, what you 
are
trying to do, what you expect as output, and what you are getting as 
output,
could help us better understand and help with the issue you see.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

The script I sent changes the locale information i.e. LANG and LC_ALL are 
set to en_US.CP1252. i.e.

export LANG="en_US.CP1252"
export LC_ALL=en_US.CP1252

Then, it runs a simple Win32 program that takes a single input argument, 
ZÇ, the second character being C-cedilla, an 8-bit character, hex value 
0xc7. The Win32 program transcodes the input Unicode argument using the 
Cygwin character set to determine the codepage, 1252. It then prints the 
transcoded characters to stdout, and the result should be ZÇ, identical to 
the input argument. This works fine using Cygwin 1.7.28. Cygwin 3.1.4 is 
launching the Win32 application, and is responsible for transcoding the 
arguments passed to it by mksh, in this case CP1252 characters ZÇ, into 
Unicode. That means Cygwin has to use the mb-to-uc function for 
transcoding codepage 1252 to Unicode. It does not. It uses the UTF-8 to 
Unicode function (I've seen this using gdb). That function flags the Ç as 
an invalid UTF-8 sequence, not surprisingly since it's not a UTF-8 
character. No matter what character set I use in 'export LANG...' and 
'export LC_ALL...', Cygwin 3.1.4 always uses the uft8-to-wc transcoding 
function in sys1.7.28 Uses the correct function. I'm not using mintty, I'm 
using mksh, a requirement since our software uses lots of shell scripts, 
and for legacy support, that means using a Korn shell. I could understand 
it if 1.7.28 didn't do the proper transcoding, but it does. 

I used:

        gdb mksh

to load mksh into the debugger, then started it with

        start -c 'cygtest.exe ZÇ'

That allowed me to step into child_info_spawn::worker() and stop at the 
call to CreateProcess(), where the command line (cygtest.exe) and argument 
(ZÇ) are translated into Unicode.

This is the code to which I'm referring, in strfuncs.cc, which is supposed 
to translate the command line and arguments from CP 1252 into Unicode.

  size_t __reg3
  sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
  {
    mbtowc_p f_mbtowc = __MBTOWC;
    if (f_mbtowc == __ascii_mbtowc)
      {
        f_mbtowc = __utf8_mbtowc;       <<<< THE CODE CHANGES THE 
'__ascii_mbtowc' TO '__utf8_mbtowc' EVERY TIME, REGARDLESS OF THE 
CODEPAGE.
      }
    return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
  }

So 'f_mbtowc' is set to _ascii_mbtowc, the default.You said:

You can only change input character sets on the fly;

The input character set to Cygwin should have been changed to CP 1252, as 
it was in 1.7.28. At least, that's what I would expect to happen. If it 
does not, or if miintty is required, then that's a regression from 1.7.28.

Mike Shay







  
NOTICE  from Ab Initio: This email (including any attachments) may contain information that is subject to confidentiality obligations or is legally privileged, and sender does not waive confidentiality or privilege. If received in error, please notify the sender, delete this email, and make no further use, disclosure, or distribution.  
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019