Mail Archives: cygwin/2020/08/03/18:06:06
X-Recipient: | archive-cygwin AT delorie DOT com
|
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 0D4703857C53
|
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
|
| s=default; t=1596492323;
|
| bh=XTo3jWPNMsQcvH0C6q9ilH7AOZ/wcUFwe8GRgEzxQrQ=;
|
| h=In-Reply-To:To:Subject:Date:References:List-Id:List-Unsubscribe:
|
| List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
|
| From;
|
| b=nfoOKTx0Jvj6WOz1gjrCglXqcT2Ac/tTM7cGkmVHCAsyN3p8vn5e+oZi0m4GNNo/0
|
| Yligm2VVPju/Kg8PUvk7mPBNyat6kBGi7qKEmvbic8w+tFBRFvn1OO2cil3R8Pzm/y
|
| nPMwyJwOaanQGaLZIQ/4qvxVeeSiuK/tFZYHG0EY=
|
X-Original-To: | cygwin AT cygwin DOT com
|
Delivered-To: | cygwin AT cygwin DOT com
|
DMARC-Filter: | OpenDMARC Filter v1.3.2 sourceware.org 946E03857C42
|
In-Reply-To: | <d8133245-02f0-71a7-e409-bf3b82fc7756@SystematicSw.ab.ca>
|
To: | cygwin AT cygwin DOT com
|
Subject: | Re: Trouble with output character sets from Win32 applications running
|
| under mintty
|
Message-ID: | <OFE0AAB507.AC9FD3B4-ON852585B9.0076DEA7-852585B9.007955FA@abinitio.com>
|
Date: | Mon, 3 Aug 2020 18:05:18 -0400
|
References: | <OF3F4D2646 DOT 3A75682C-ON852585B5 DOT 0058983D-852585B9 DOT 0055B758 AT abinitio DOT com>
|
| <ae1f8133-948a-4497-049b-b8349a138143 AT SystematicSw DOT ab DOT ca>
|
| <OF28060D19 DOT DB6E392B-ON852585B9 DOT 005D898D-852585B9 DOT 005E6021 AT abinitio DOT com>
|
| <1314865780 DOT 20200803204249 AT yandex DOT ru>
|
| <d8133245-02f0-71a7-e409-bf3b82fc7756 AT SystematicSw DOT ab DOT ca>
|
MIME-Version: | 1.0
|
X-KeepSent: | E0AAB507:AC9FD3B4-852585B9:0076DEA7; name=$KeepSent; type=4
|
X-Disclaimed: | 25291
|
X-Spam-Status: | No, score=-2.4 required=5.0 tests=BAYES_00, HTML_MESSAGE,
|
| KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS,
|
| TXREP autolearn=ham autolearn_force=no version=3.4.2
|
X-Spam-Checker-Version: | SpamAssassin 3.4.2 (2018-09-13) on
|
| server2.sourceware.org
|
X-Content-Filtered-By: | Mailman/MimeDel 2.1.29
|
X-BeenThere: | cygwin AT cygwin DOT com
|
X-Mailman-Version: | 2.1.29
|
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com>
|
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>,
|
| <mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
|
List-Archive: | <https://cygwin.com/pipermail/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help>
|
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>,
|
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
|
From: | Michael Shay via Cygwin <cygwin AT cygwin DOT com>
|
Reply-To: | Michael Shay <MShay AT ABINITIO DOT COM>
|
Errors-To: | cygwin-bounces AT cygwin DOT com
|
Sender: | "Cygwin" <cygwin-bounces AT cygwin DOT com>
|
X-MIME-Autoconverted: | from quoted-printable to 8bit by delorie.com id 073M5m2v015492
|
Michael
From: "Brian Inglis" <Brian DOT Inglis AT SystematicSw DOT ab DOT ca>
To: cygwin AT cygwin DOT com
Date: 08/03/2020 05:23 PM
Subject: Re: Trouble with output character sets from Win32
applications running under mintty
Sent by: "Cygwin" <cygwin-bounces AT cygwin DOT com>
On 2020-08-03 11:42, Andrey Repin wrote:
>> Doesn't help. I tried 65001 (UTF-8):
>
> Because you're confusing things.
> chcp has nothing to do with LANG or LC_*.
> Et vice versa.
>
> chcp sets console code page for native console applications. Only for
those
> supporting it. Many do not.
> LANG sets output parameters for Cygwin applications (and other programs
that
> look for it, but these are few).
You cut the significant statement at the top of the OP:
>> I'm having a problem with Cygwin 3.1.4, changing the character set on
the
>> fly. It seems to work with Cygwin applications, but not with Win32
>> applications.
He has problems with invalid characters only running win32 console
applications:
I changed the subject to hopefully better reflect the issue.
I am unsure where Cygwin 3.1.4 comes into Win32 applications - you have to
use
the Windows codepage conversion routines.
You can only change input character sets on the fly; output character sets
will
depend on mintty support of xterm-compatible character set support and
switching
escape sequences; if you set up UCS16LE console output, Windows and mintty
should handle it.
Perhaps a better description of your environment, build tools, what you
are
trying to do, what you expect as output, and what you are getting as
output,
could help us better understand and help with the issue you see.
--
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada
This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.
[Data in IEC units and prefixes, physical quantities in SI.]
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
The script I sent changes the locale information i.e. LANG and LC_ALL are
set to en_US.CP1252. i.e.
export LANG="en_US.CP1252"
export LC_ALL=en_US.CP1252
Then, it runs a simple Win32 program that takes a single input argument,
ZÇ, the second character being C-cedilla, an 8-bit character, hex value
0xc7. The Win32 program transcodes the input Unicode argument using the
Cygwin character set to determine the codepage, 1252. It then prints the
transcoded characters to stdout, and the result should be ZÇ, identical to
the input argument. This works fine using Cygwin 1.7.28. Cygwin 3.1.4 is
launching the Win32 application, and is responsible for transcoding the
arguments passed to it by mksh, in this case CP1252 characters ZÇ, into
Unicode. That means Cygwin has to use the mb-to-uc function for
transcoding codepage 1252 to Unicode. It does not. It uses the UTF-8 to
Unicode function (I've seen this using gdb). That function flags the Ç as
an invalid UTF-8 sequence, not surprisingly since it's not a UTF-8
character. No matter what character set I use in 'export LANG...' and
'export LC_ALL...', Cygwin 3.1.4 always uses the uft8-to-wc transcoding
function in sys1.7.28 Uses the correct function. I'm not using mintty, I'm
using mksh, a requirement since our software uses lots of shell scripts,
and for legacy support, that means using a Korn shell. I could understand
it if 1.7.28 didn't do the proper transcoding, but it does.
I used:
gdb mksh
to load mksh into the debugger, then started it with
start -c 'cygtest.exe ZÇ'
That allowed me to step into child_info_spawn::worker() and stop at the
call to CreateProcess(), where the command line (cygtest.exe) and argument
(ZÇ) are translated into Unicode.
This is the code to which I'm referring, in strfuncs.cc, which is supposed
to translate the command line and arguments from CP 1252 into Unicode.
size_t __reg3
sys_mbstowcs (wchar_t * dst, size_t dlen, const char *src, size_t nms)
{
mbtowc_p f_mbtowc = __MBTOWC;
if (f_mbtowc == __ascii_mbtowc)
{
f_mbtowc = __utf8_mbtowc; <<<< THE CODE CHANGES THE
'__ascii_mbtowc' TO '__utf8_mbtowc' EVERY TIME, REGARDLESS OF THE
CODEPAGE.
}
return sys_cp_mbstowcs (f_mbtowc, dst, dlen, src, nms);
}
So 'f_mbtowc' is set to _ascii_mbtowc, the default.You said:
You can only change input character sets on the fly;
The input character set to Cygwin should have been changed to CP 1252, as
it was in 1.7.28. At least, that's what I would expect to happen. If it
does not, or if miintty is required, then that's a regression from 1.7.28.
Mike Shay
NOTICE from Ab Initio: This email (including any attachments) may contain information that is subject to confidentiality obligations or is legally privileged, and sender does not waive confidentiality or privilege. If received in error, please notify the sender, delete this email, and make no further use, disclosure, or distribution.
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
- Raw text -