Mail Archives: cygwin/2020/08/03/11:37:05
X-Recipient: | archive-cygwin AT delorie DOT com
|
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 1DD2F386187E
|
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
|
| s=default; t=1596468979;
|
| bh=LPOrXuRAUmeUxo5v2XI4ZxjR0ey8BHOaaQkaY/zlZpQ=;
|
| h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
|
| List-Help:List-Subscribe:From:Reply-To:From;
|
| b=hylYYflOD0GQY+1VCDG9LHcF00k8qm528k0sAy5TwNXrtJTXK/bH57h4G0MCkhL4X
|
| de/mFz851njcjyofAy6yNFRJMOmkr4nRS8Ymh99PiYUrI1NW1Vso1gW+ujp6IxWMO8
|
| WcbT8jNmw6xaQalEDXKdMJV4+GfCGVOwpvudadFc=
|
X-Original-To: | cygwin AT cygwin DOT com
|
Delivered-To: | cygwin AT cygwin DOT com
|
DMARC-Filter: | OpenDMARC Filter v1.3.2 sourceware.org A24DD3850413
|
To: | cygwin AT cygwin DOT com
|
Subject: | Trouble with character sets
|
Message-ID: | <OF3F4D2646.3A75682C-ON852585B5.0058983D-852585B9.0055B758@abinitio.com>
|
Date: | Mon, 3 Aug 2020 11:36:14 -0400
|
MIME-Version: | 1.0
|
X-KeepSent: | 3F4D2646:3A75682C-852585B5:0058983D; name=$KeepSent; type=4
|
X-Disclaimed: | 5667
|
X-Spam-Status: | No, score=-2.5 required=5.0 tests=BAYES_00, HTML_MESSAGE,
|
| KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS,
|
| TXREP autolearn=ham autolearn_force=no version=3.4.2
|
X-Spam-Checker-Version: | SpamAssassin 3.4.2 (2018-09-13) on
|
| server2.sourceware.org
|
X-Content-Filtered-By: | Mailman/MimeDel 2.1.29
|
X-BeenThere: | cygwin AT cygwin DOT com
|
X-Mailman-Version: | 2.1.29
|
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com>
|
List-Archive: | <https://cygwin.com/pipermail/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help>
|
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>,
|
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
|
From: | Michael Shay via Cygwin <cygwin AT cygwin DOT com>
|
Reply-To: | Michael Shay <MShay AT ABINITIO DOT COM>
|
Sender: | "Cygwin" <cygwin-bounces AT cygwin DOT com>
|
--=_mixed 0055B756852585B9_=
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
I'm having a problem with Cygwin 3.1.4, changing the character set on the=20
fly. It seems to work with Cygwin applications, but not with Win32=20
applications.
I have a Korn shell script:
#!/bin/ksh
OLD=5FLANG=3D"$LANG"
OLD=5FLC=5FALL=3D"$LC=5FALL"
echo "locale on entry"
locale
echo ""
export LANG=3D"en=5FUS.CP1252"
export LC=5FALL=3Den=5FUS.CP1252
echo "locale changed to"
locale
echo ""
# Default is to run the Win32 program. Input any argument other than=20
'WIN32'
# to run '/bin/echo'.
case $# in
0 ) echo "Running WIN32 pgm"
ksh -c 'cygtest.exe Z=C7'
;;
1 ) echo "Running Cygwin 'echo'"
ksh -c '/bin/echo Z=C7'
;;
2 ) echo "Running WIN32 pgm"
ksh -c 'cygtest.exe Z=C7'
echo ""
echo "Running Cygwin 'echo'"
ksh -c '/bin/echo Z=C7'
;;
* ) ;;
esac
LC=5FALL=3D"$OLD=5FLC=5FALL"
LANG=3D"$OLD=5FLANG"
and a Win32 application (attached file cygtest.cpp)
I used gdb to see what was happening in child=5Finfo=5Fspawn::worker(), whe=
n a=20
Win32 program is started using:
rc =3D CreateProcessW (runpath, /* image name w/ full path */
cmd.wcs (wcmd), /* what was passed to exec */
sa, /* process security attrs */
sa, /* thread security attrs */
TRUE, /* inherit handles */
c=5Fflags,
envblock, /* environment */
NULL,
&si,
&pi);
Specifically, 'cmd.wcs(wcmd)' invokes:
wchar=5Ft *wcs (wchar=5Ft *wbuf, size=5Ft n)
{
if (n =3D=3D 1)
wbuf[0] =3D L'\0';
else
sys=5Fmbstowcs (wbuf, n, buf);
return wbuf;
}
and sys=5Fmbstowcs():
size=5Ft =5F=5Freg3
sys=5Fmbstowcs (wchar=5Ft * dst, size=5Ft dlen, const char *src, size=5Ft n=
ms)
{
mbtowc=5Fp f=5Fmbtowc =3D =5F=5FMBTOWC;
if (f=5Fmbtowc =3D=3D =5F=5Fascii=5Fmbtowc)
{
f=5Fmbtowc =3D =5F=5Futf8=5Fmbtowc; <=
<<<< this=20
is ALWAYS done, no matter what charset is in use.
}
return sys=5Fcp=5Fmbstowcs (f=5Fmbtowc, dst, dlen, src, nms);
}
Since the CP1252 is an 8-bit single-byte character set with characters >=3D=
=20
0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the=20
'0xf0' byte indicating an invalid character in the string.
This doesn't seem to happen when e.g. '/bin/echo' is run, although I=20
haven't stepped into the code to see what's happening.
I do not think this is a Cygwin bug, but since the User's Guide says the=20
locale and charset can be changed on the fly, I don't know what's going=20
awry.
Any suggestions? If you need more information, I'm happy to provide it.
Mike Shay
Here's the source for the Win32 program. I built it with Visual Studio=20
2015, to get something running quickly.
=20
NOTICE from Ab Initio: This email (including any attachments) may contain =
information that is subject to confidentiality obligations or is legally pr=
ivileged, and sender does not waive confidentiality or privilege. If receiv=
ed in error, please notify the sender, delete this email, and make no furth=
er use, disclosure, or distribution.
--=_mixed 0055B756852585B9_=
Content-Type: application/octet-stream; name="cygtest.cpp"
Content-Disposition: attachment; filename="cygtest.cpp"
Content-Transfer-Encoding: base64
Ly8gY3lndGVzdC5jcHAgOiBEZWZpbmVzIHRoZSBlbnRyeSBwb2ludCBmb3IgdGhlIGNvbnNvbGUg
YXBwbGljYXRpb24uDQovLw0KDQoNCiNpbmNsdWRlIDxTREtEREtWZXIuaD4NCiNpbmNsdWRlIDxz
dGRpby5oPg0KI2luY2x1ZGUgPHdpbmRvd3MuaD4NCiNpbmNsdWRlIDxzdHJpbmc+DQp1c2luZyBu
YW1lc3BhY2Ugc3RkOw0KDQpMUFNUUiBfX3N0ZGNhbGwgVW5pY29kZVRvTUJ5dGVIZWxwZXIoTFBT
VFIgbHBhLCBpbnQgbkJ5dGVzLCBMUENXU1RSIGxwdywgaW50IG5DaGFycywgaW50IGNvZGVwYWdl
KTsNCg0Kc3RhdGljIFVJTlQgY3lnX2NvZGVwYWdlX3N0cmluZ190b19DUChjb25zdCBzdHJpbmcg
JmNwKQ0Kew0KICBjb25zdCBzdHJpbmcgVVRGOCA9ICJVVEYtOCI7DQogIGNvbnN0IHN0cmluZyB1
dGY4ID0gInV0Zi04IjsNCiAgY29uc3Qgc3RyaW5nIEFOU0kgPSAiQU5TSSI7DQogIGNvbnN0IHN0
cmluZyBhbnNpID0gImFuc2kiOw0KICBjb25zdCBzdHJpbmcgSVNPODg1OTEgPSAiSVNPLTg4NTkt
MSI7DQogIGNvbnN0IHN0cmluZyBpc284ODU5MSA9ICJpc28tODg1OS0xIjsNCiAgY29uc3Qgc3Ry
aW5nIE9FTSA9ICJPRU0iOw0KICBjb25zdCBzdHJpbmcgb2VtID0gIm9lbSI7DQogIGNvbnN0IHN0
cmluZyBXSU5ET1dTID0gIldJTkRPV1MiOw0KICBjb25zdCBzdHJpbmcgd2luZG93cyA9ICJ3aW5k
b3dzIjsNCiAgY29uc3Qgc3RyaW5nIENPREVQQUdFID0gIkNQIjsNCiAgY29uc3Qgc3RyaW5nIGNv
ZGVwYWdlID0gImNwIjsNCiAgVUlOVCBzaGVsbF9jcHsgMCB9Ow0KDQogIGlmIChOVUxMID09IGNw
LmNfc3RyKCkgfHwgY3AubGVuZ3RoKCkgPT0gMCkNCiAgICByZXR1cm4gMDsNCg0KICBpZiAoKGNw
LmNvbXBhcmUodXRmOCkgPT0gMCkgfHwgKGNwLmNvbXBhcmUoVVRGOCkgPT0gMCkpDQogICAgc2hl
bGxfY3AgPSA2NTAwMTsNCiAgZWxzZSBpZiAoKGNwLmNvbXBhcmUoYW5zaSkgPT0gMCkgfHwgKGNw
LmNvbXBhcmUoQU5TSSkgPT0gMCkNCiAgICB8fCAoY3AuY29tcGFyZShJU084ODU5MSkgPT0gMCkg
fHwgKGNwLmNvbXBhcmUoaXNvODg1OTEpID09IDApKQ0KICAgIHNoZWxsX2NwID0gMTI1MjsNCiAg
Ly8gb2VtIGlzIGFsc28gc3RhbmRhcmQgY3lnd2luIG5vbWVuY2xhdHVyZQ0KICBlbHNlIGlmICgo
Y3AuY29tcGFyZShvZW0pID09IDApIHx8IChjcC5jb21wYXJlKE9FTSkgPT0gMCkpDQogICAgc2hl
bGxfY3AgPSA0Mzc7DQogIC8vIGNwWFhYLCB3aW5kb3dzLVhYWCBhbmQgd2luZG93c19YWFggYXJl
IGFsbCByZWNvZ25pemVkIGJ5DQogIC8vIHRoZSBBYiBJbml0aW8gZXh0ZW5zaW9ucyB0byBjeWd3
aW4uICBOb3Qgc3VyZSBpZiB0aGV5IGFyZQ0KICAvLyBrbm93biB0byBzdGFuZGFyZCBjeWd3aW4s
IGJ1dCBJIGRvbid0IHRoaW5rIHRoZXkgYXJlLg0KICBlbHNlIGlmICgoY3AuY29tcGFyZSgwLCAy
LCBjb2RlcGFnZSkgPT0gMCkgfHwNCiAgICAoY3AuY29tcGFyZSgwLCAyLCBDT0RFUEFHRSkgPT0g
MCkgfHwNCiAgICAoY3AuY29tcGFyZSgwLCA3LCB3aW5kb3dzKSA9PSAwKSB8fA0KICAgIChjcC5j
b21wYXJlKDAsIDcsIFdJTkRPV1MpID09IDApKSB7DQogICAgLy8gSWYgdGhlIHByZWZpeCBpcyAi
Q1AiIG9yICJjcCIgdGhlbiBnZXQgdGhlIG51bWJlciBhZnRlciB0aGF0DQogICAgLy8gZWxzZSBp
dCdzICJXSU5ET1dTey0sX30iIG9yICJXSU5ET1dTey0sX30iDQogICAgaW50IG9mZnNldCA9ICgo
Y3AuY29tcGFyZSgwLCAyLCBjb2RlcGFnZSkgPT0gMCkgfHwgKGNwLmNvbXBhcmUoMCwgMiwgQ09E
RVBBR0UpID09IDApKSA/IDIgOiA4Ow0KICAgIHNoZWxsX2NwID0gYXRvaShjcC5zdWJzdHIob2Zm
c2V0KS5jX3N0cigpKTsNCiAgfQ0KICByZXR1cm4gc2hlbGxfY3A7DQp9DQoNCnN0YXRpYyBVSU5U
IGdldF9jeWd3aW5fY29kZXBhZ2UoKQ0Kew0KICBzdHJpbmcgZGVmYXVsdF9jeWdfY2hhcnNldCA9
ICJDLlVURi04IjsgICAgICAgIC8vIEN5Z3dpbiBkZWZhdWx0IGNoYXJhY3RlciBzZXQNCiAgc3Ry
aW5nIGN5Z19sb2NhbGU7DQogIFVJTlQgc2hlbGxfY3B7IDAgfTsNCiAgVUlOVCBkZWZhdWx0X2Nw
eyA2NTAwMSB9Ow0KICBjaGFyICplbnZwdHIgPSA6OmdldGVudigiTEFORyIpOw0KDQogIGlmIChO
VUxMID09IGVudnB0cikNCiAgICBlbnZwdHIgPSA6OmdldGVudigiTENfQUxMIik7DQoNCiAgY3ln
X2xvY2FsZSA9IChOVUxMID09IGVudnB0ciA/IGRlZmF1bHRfY3lnX2NoYXJzZXQgOiBlbnZwdHIp
Ow0KICAvLyBUaGUgJ3ZhbHVlJyBmaWVsZCBvZiB0aGUgZW52aXJvbm1lbnQgc3RyaW5nICJ2YXJf
bmFtZT12YWx1ZSINCiAgLy8gd2lsbCBiZSBvZiB0aGUgZm9ybTogPGxhbmd1YWdlIElEPi48Y29k
ZXBhZ2UgSUQ+DQogIC8vIFdlIHdhbnQgdGhlIHN1YnN0cmluZyBhZnRlciB0aGUgJy4nICANCiAg
aW50IGRvdFBvcyA9IGN5Z19sb2NhbGUuZmluZF9maXJzdF9vZignLicpOw0KICBpZiAoZG90UG9z
ID49IDApIHsNCiAgICAvLyBUaGUgY2hhcmFjdGVyIHNldCBzdHJpbmcsIGlmIHNwZWNpZmllZCwg
c3RhcnRzIEFGVEVSICB0aGUgJy4nLg0KICAgIC8vIElmIE5PVCBzcGVjaWZpZWQsIHJldHVybiB0
aGUgaW5wdXQgZGVmYXVsdC4NCiAgICBzdHJpbmcgcGFnZSA9IGN5Z19sb2NhbGUuc3Vic3RyKCsr
ZG90UG9zKTsNCiAgICBpZiAoMCA8PSAoc2hlbGxfY3AgPSBjeWdfY29kZXBhZ2Vfc3RyaW5nX3Rv
X0NQKHBhZ2UpKSkgew0KICAgICAgcmV0dXJuIHNoZWxsX2NwOw0KICAgIH0gIC8vIGVuZCBTSEVM
TF9DUA0KICB9ICAgIC8vIGVuZCBFUVBPUw0KICByZXR1cm4gZGVmYXVsdF9jcDsNCn0NCg0KDQpM
UFNUUiBfX3N0ZGNhbGwgVW5pY29kZVRvTUJ5dGVIZWxwZXIoTFBTVFIgbHBhLCBpbnQgbkJ5dGVz
LCBMUENXU1RSIGxwdywgaW50IG5DaGFycywgaW50IGNvZGVwYWdlKQ0Kew0KICBzdGF0aWMgaW50
IHByaW50SW5mbyA9IDA7DQogIGludCBuT3V0ID0gMDsNCg0KICBpZiAoTlVMTCA9PSBscGEpIHsN
CiAgICBwcmludGYoIk5VTEwgaW5wdXQgc3RyaW5nXG4iKTsNCiAgICByZXR1cm4gTlVMTDsNCiAg
fQ0KDQogIGlmIChwcmludEluZm8pIHsNCiAgICBwcmludGYoIlRyYW5zY29kaW5nIHVzaW5nIEN5
Z3dpbiBjb2RlcGFnZTogJWRcbklucHV0IHdpZGVjaGFyIHN0cmluZzpcbiIsIGNvZGVwYWdlKTsN
CiAgICBmb3IgKGludCBpID0gMDsgaSA8IG5DaGFyczsgaSsrKQ0KICAgICAgcHJpbnRmKCJcdGxw
d1slZF0gPSAlQyAtICUwMlhcbiIsIGksIGxwd1tpXSwgbHB3W2ldKTsNCiAgfQ0KICArK3ByaW50
SW5mbzsNCg0KICBpZiAobkNoYXJzID4gMCkgew0KICAgIGlmICgwID09IChuT3V0ID0gV2lkZUNo
YXJUb011bHRpQnl0ZShjb2RlcGFnZSwgMCwgbHB3LCBuQ2hhcnMsIGxwYSwgbkJ5dGVzLCBOVUxM
LCBOVUxMKSkpIHsNCiAgICAgIERXT1JEIGR3RXJyID0gR2V0TGFzdEVycm9yKCk7DQogICAgICBw
cmludGYoIldpZGVDaGFyVG9NdWx0aUJ5dGUoJWQsICVTKSBmYWlsZWQsIGVycm9yICVkXG4iLCBj
b2RlcGFnZSwgbHB3LCBkd0Vycik7DQogICAgICByZXR1cm4gTlVMTDsNCiAgICB9DQogIH0NCiAg
bHBhW25PdXRdID0gJ1wwJzsNCiAgcmV0dXJuIGxwYTsNCn0NCg0KaW50IHdtYWluKGludCBhcmdj
LCB3Y2hhcl90Kiogd2FyZ3YpDQp7DQogIHRyeSB7DQogICAgY2hhciAqcE51bGwgPSAiTlVMTCI7
DQogICAgY2hhcioqIGFyZ3YgPSBuZXcgY2hhcipbKGFyZ2MpKzFdOw0KICAgIGludCBfYXJnaTsN
CiAgICBpbnQgY29kZXBhZ2UgPSBnZXRfY3lnd2luX2NvZGVwYWdlKCk7DQogICAgZm9yIChfYXJn
aSA9IDA7IF9hcmdpIDwgKGFyZ2MpOyBfYXJnaSsrKSB7DQogICAgICBpZiAod2FyZ3ZbX2FyZ2ld
KSB7DQogICAgICAgIExQV1NUUiB1dGZfbHB3ICA9IHdhcmd2W19hcmdpXTsNCiAgICAgICAgaW50
IHV0Zl9sZW4gICAgID0gbHN0cmxlblcodXRmX2xwdyk7DQogICAgICAgIGludCB1dGZfY29udmVy
dCA9IHV0Zl9sZW4gKiAzICsgMTsNCiAgICAgICAgTFBTVFIgdXRmX2xwYSAgID0gKExQU1RSKV9h
bGxvY2EodXRmX2NvbnZlcnQpOw0KICAgICAgICBhcmd2W19hcmdpXSAgICAgPSBVbmljb2RlVG9N
Qnl0ZUhlbHBlcih1dGZfbHBhLCB1dGZfY29udmVydCwgdXRmX2xwdywgdXRmX2xlbiwgY29kZXBh
Z2UpOw0KICAgICAgfQ0KICAgICAgZWxzZSB7DQogICAgICAgIGFyZ3ZbX2FyZ2ldID0gcE51bGw7
DQogICAgICB9DQogICAgfQ0KICAgIGFyZ3ZbKGFyZ2MpXSA9IE5VTEw7DQoNCiAgICAvLyBOb3cg
cHJpbnQgdGhlIHRyYW5zY29kZWQgc3RyaW5nLg0KDQogICAgZm9yIChpbnQgaSA9IDE7IGkgPCBh
cmdjOyBpKyspDQogICAgICBwcmludGYoIiVzOiAlc1xuIiwgX19GVU5DVElPTl9fLCBhcmd2W2ld
KTsNCg0KICAgIHJldHVybiAwOw0KICB9DQogIGNhdGNoICguLi4pIHsNCiAgICBwcmludGYoIkNh
dWdodCB1bmhhbmRsZWQgZXhjZXB0aW9uXG4iKTsNCiAgfQ0KfQ0K
--=_mixed 0055B756852585B9_=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
--=_mixed 0055B756852585B9_=--
- Raw text -