delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2020/08/03/11:37:05

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1DD2F386187E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1596468979;
bh=LPOrXuRAUmeUxo5v2XI4ZxjR0ey8BHOaaQkaY/zlZpQ=;
h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
List-Help:List-Subscribe:From:Reply-To:From;
b=hylYYflOD0GQY+1VCDG9LHcF00k8qm528k0sAy5TwNXrtJTXK/bH57h4G0MCkhL4X
de/mFz851njcjyofAy6yNFRJMOmkr4nRS8Ymh99PiYUrI1NW1Vso1gW+ujp6IxWMO8
WcbT8jNmw6xaQalEDXKdMJV4+GfCGVOwpvudadFc=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org A24DD3850413
To: cygwin AT cygwin DOT com
Subject: Trouble with character sets
Message-ID: <OF3F4D2646.3A75682C-ON852585B5.0058983D-852585B9.0055B758@abinitio.com>
Date: Mon, 3 Aug 2020 11:36:14 -0400
MIME-Version: 1.0
X-KeepSent: 3F4D2646:3A75682C-852585B5:0058983D; name=$KeepSent; type=4
X-Disclaimed: 5667
X-Spam-Status: No, score=-2.5 required=5.0 tests=BAYES_00, HTML_MESSAGE,
KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS,
TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Michael Shay via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Michael Shay <MShay AT ABINITIO DOT COM>
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>

--=_mixed 0055B756852585B9_=
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

I'm having a problem with Cygwin 3.1.4, changing the character set on the=20
fly. It seems to work with Cygwin applications, but not with Win32=20
applications.

I have a Korn shell script:
#!/bin/ksh

OLD=5FLANG=3D"$LANG"
OLD=5FLC=5FALL=3D"$LC=5FALL"

echo "locale on entry"
locale
echo ""

export LANG=3D"en=5FUS.CP1252"
export LC=5FALL=3Den=5FUS.CP1252

echo "locale changed to"
locale
echo ""

# Default is to run the Win32 program. Input any argument other than=20
'WIN32'
# to run '/bin/echo'.

case $# in
   0 )  echo "Running WIN32 pgm"
        ksh -c 'cygtest.exe Z=C7'
        ;;
   1 )  echo "Running Cygwin 'echo'"
        ksh -c '/bin/echo Z=C7'
        ;;
   2 )  echo "Running WIN32 pgm"
        ksh -c 'cygtest.exe Z=C7'
        echo ""
        echo "Running Cygwin 'echo'"
        ksh -c '/bin/echo Z=C7'
        ;;
   * ) ;;
esac

LC=5FALL=3D"$OLD=5FLC=5FALL"
LANG=3D"$OLD=5FLANG"

and a Win32 application (attached file cygtest.cpp)

I used gdb to see what was happening in child=5Finfo=5Fspawn::worker(), whe=
n a=20
Win32 program is started using:

          rc =3D CreateProcessW (runpath,   /* image name w/ full path */
                   cmd.wcs (wcmd),  /* what was passed to exec */
                   sa,    /* process security attrs */
                   sa,    /* thread security attrs */
                   TRUE,    /* inherit handles */
                   c=5Fflags,
                   envblock,  /* environment */
                   NULL,
                   &si,
                   &pi);
Specifically, 'cmd.wcs(wcmd)' invokes:

  wchar=5Ft *wcs (wchar=5Ft *wbuf, size=5Ft n)
  {
    if (n =3D=3D 1)
      wbuf[0] =3D L'\0';
    else
        sys=5Fmbstowcs (wbuf, n, buf);
    return wbuf;
  }

and sys=5Fmbstowcs():

size=5Ft =5F=5Freg3
sys=5Fmbstowcs (wchar=5Ft * dst, size=5Ft dlen, const char *src, size=5Ft n=
ms)
{
  mbtowc=5Fp f=5Fmbtowc =3D =5F=5FMBTOWC;
  if (f=5Fmbtowc =3D=3D =5F=5Fascii=5Fmbtowc)
    {
      f=5Fmbtowc =3D =5F=5Futf8=5Fmbtowc;                                 <=
<<<< this=20
is ALWAYS done, no matter what charset is in use.
    }
  return sys=5Fcp=5Fmbstowcs (f=5Fmbtowc, dst, dlen, src, nms);
}

Since the CP1252 is an 8-bit single-byte character set with characters >=3D=
=20
0x80, the '0xc7' character is always translated as '0xc7 0xf0', with the=20
'0xf0' byte indicating an invalid character in the string.

This doesn't seem to happen when e.g. '/bin/echo' is run, although I=20
haven't stepped into the code to see what's happening.

I do not think this is a Cygwin bug, but since the User's Guide says the=20
locale and charset can be changed on the fly, I don't know what's going=20
awry.

Any suggestions? If you need more information, I'm happy to provide it.

Mike Shay

Here's the source for the Win32 program. I built it with Visual Studio=20
2015, to get something running quickly.



 =20
NOTICE  from Ab Initio: This email (including any attachments) may contain =
information that is subject to confidentiality obligations or is legally pr=
ivileged, and sender does not waive confidentiality or privilege. If receiv=
ed in error, please notify the sender, delete this email, and make no furth=
er use, disclosure, or distribution.  
--=_mixed 0055B756852585B9_=
Content-Type: application/octet-stream; name="cygtest.cpp"
Content-Disposition: attachment; filename="cygtest.cpp"
Content-Transfer-Encoding: base64

Ly8gY3lndGVzdC5jcHAgOiBEZWZpbmVzIHRoZSBlbnRyeSBwb2ludCBmb3IgdGhlIGNvbnNvbGUg
YXBwbGljYXRpb24uDQovLw0KDQoNCiNpbmNsdWRlIDxTREtEREtWZXIuaD4NCiNpbmNsdWRlIDxz
dGRpby5oPg0KI2luY2x1ZGUgPHdpbmRvd3MuaD4NCiNpbmNsdWRlIDxzdHJpbmc+DQp1c2luZyBu
YW1lc3BhY2Ugc3RkOw0KDQpMUFNUUiBfX3N0ZGNhbGwgVW5pY29kZVRvTUJ5dGVIZWxwZXIoTFBT
VFIgbHBhLCBpbnQgbkJ5dGVzLCBMUENXU1RSIGxwdywgaW50IG5DaGFycywgaW50IGNvZGVwYWdl
KTsNCg0Kc3RhdGljIFVJTlQgY3lnX2NvZGVwYWdlX3N0cmluZ190b19DUChjb25zdCBzdHJpbmcg
JmNwKQ0Kew0KICBjb25zdCBzdHJpbmcgVVRGOCA9ICJVVEYtOCI7DQogIGNvbnN0IHN0cmluZyB1
dGY4ID0gInV0Zi04IjsNCiAgY29uc3Qgc3RyaW5nIEFOU0kgPSAiQU5TSSI7DQogIGNvbnN0IHN0
cmluZyBhbnNpID0gImFuc2kiOw0KICBjb25zdCBzdHJpbmcgSVNPODg1OTEgPSAiSVNPLTg4NTkt
MSI7DQogIGNvbnN0IHN0cmluZyBpc284ODU5MSA9ICJpc28tODg1OS0xIjsNCiAgY29uc3Qgc3Ry
aW5nIE9FTSA9ICJPRU0iOw0KICBjb25zdCBzdHJpbmcgb2VtID0gIm9lbSI7DQogIGNvbnN0IHN0
cmluZyBXSU5ET1dTID0gIldJTkRPV1MiOw0KICBjb25zdCBzdHJpbmcgd2luZG93cyA9ICJ3aW5k
b3dzIjsNCiAgY29uc3Qgc3RyaW5nIENPREVQQUdFID0gIkNQIjsNCiAgY29uc3Qgc3RyaW5nIGNv
ZGVwYWdlID0gImNwIjsNCiAgVUlOVCBzaGVsbF9jcHsgMCB9Ow0KDQogIGlmIChOVUxMID09IGNw
LmNfc3RyKCkgfHwgY3AubGVuZ3RoKCkgPT0gMCkNCiAgICByZXR1cm4gMDsNCg0KICBpZiAoKGNw
LmNvbXBhcmUodXRmOCkgPT0gMCkgfHwgKGNwLmNvbXBhcmUoVVRGOCkgPT0gMCkpDQogICAgc2hl
bGxfY3AgPSA2NTAwMTsNCiAgZWxzZSBpZiAoKGNwLmNvbXBhcmUoYW5zaSkgPT0gMCkgfHwgKGNw
LmNvbXBhcmUoQU5TSSkgPT0gMCkNCiAgICB8fCAoY3AuY29tcGFyZShJU084ODU5MSkgPT0gMCkg
fHwgKGNwLmNvbXBhcmUoaXNvODg1OTEpID09IDApKQ0KICAgIHNoZWxsX2NwID0gMTI1MjsNCiAg
Ly8gb2VtIGlzIGFsc28gc3RhbmRhcmQgY3lnd2luIG5vbWVuY2xhdHVyZQ0KICBlbHNlIGlmICgo
Y3AuY29tcGFyZShvZW0pID09IDApIHx8IChjcC5jb21wYXJlKE9FTSkgPT0gMCkpDQogICAgc2hl
bGxfY3AgPSA0Mzc7DQogIC8vIGNwWFhYLCB3aW5kb3dzLVhYWCBhbmQgd2luZG93c19YWFggYXJl
IGFsbCByZWNvZ25pemVkIGJ5DQogIC8vIHRoZSBBYiBJbml0aW8gZXh0ZW5zaW9ucyB0byBjeWd3
aW4uICBOb3Qgc3VyZSBpZiB0aGV5IGFyZQ0KICAvLyBrbm93biB0byBzdGFuZGFyZCBjeWd3aW4s
IGJ1dCBJIGRvbid0IHRoaW5rIHRoZXkgYXJlLg0KICBlbHNlIGlmICgoY3AuY29tcGFyZSgwLCAy
LCBjb2RlcGFnZSkgPT0gMCkgfHwNCiAgICAoY3AuY29tcGFyZSgwLCAyLCBDT0RFUEFHRSkgPT0g
MCkgfHwNCiAgICAoY3AuY29tcGFyZSgwLCA3LCB3aW5kb3dzKSA9PSAwKSB8fA0KICAgIChjcC5j
b21wYXJlKDAsIDcsIFdJTkRPV1MpID09IDApKSB7DQogICAgLy8gSWYgdGhlIHByZWZpeCBpcyAi
Q1AiIG9yICJjcCIgdGhlbiBnZXQgdGhlIG51bWJlciBhZnRlciB0aGF0DQogICAgLy8gZWxzZSBp
dCdzICJXSU5ET1dTey0sX30iIG9yICJXSU5ET1dTey0sX30iDQogICAgaW50IG9mZnNldCA9ICgo
Y3AuY29tcGFyZSgwLCAyLCBjb2RlcGFnZSkgPT0gMCkgfHwgKGNwLmNvbXBhcmUoMCwgMiwgQ09E
RVBBR0UpID09IDApKSA/IDIgOiA4Ow0KICAgIHNoZWxsX2NwID0gYXRvaShjcC5zdWJzdHIob2Zm
c2V0KS5jX3N0cigpKTsNCiAgfQ0KICByZXR1cm4gc2hlbGxfY3A7DQp9DQoNCnN0YXRpYyBVSU5U
IGdldF9jeWd3aW5fY29kZXBhZ2UoKQ0Kew0KICBzdHJpbmcgZGVmYXVsdF9jeWdfY2hhcnNldCA9
ICJDLlVURi04IjsgICAgICAgIC8vIEN5Z3dpbiBkZWZhdWx0IGNoYXJhY3RlciBzZXQNCiAgc3Ry
aW5nIGN5Z19sb2NhbGU7DQogIFVJTlQgc2hlbGxfY3B7IDAgfTsNCiAgVUlOVCBkZWZhdWx0X2Nw
eyA2NTAwMSB9Ow0KICBjaGFyICplbnZwdHIgPSA6OmdldGVudigiTEFORyIpOw0KDQogIGlmIChO
VUxMID09IGVudnB0cikNCiAgICBlbnZwdHIgPSA6OmdldGVudigiTENfQUxMIik7DQoNCiAgY3ln
X2xvY2FsZSA9IChOVUxMID09IGVudnB0ciA/IGRlZmF1bHRfY3lnX2NoYXJzZXQgOiBlbnZwdHIp
Ow0KICAvLyBUaGUgJ3ZhbHVlJyBmaWVsZCBvZiB0aGUgZW52aXJvbm1lbnQgc3RyaW5nICJ2YXJf
bmFtZT12YWx1ZSINCiAgLy8gd2lsbCBiZSBvZiB0aGUgZm9ybTogPGxhbmd1YWdlIElEPi48Y29k
ZXBhZ2UgSUQ+DQogIC8vIFdlIHdhbnQgdGhlIHN1YnN0cmluZyBhZnRlciB0aGUgJy4nICANCiAg
aW50IGRvdFBvcyA9IGN5Z19sb2NhbGUuZmluZF9maXJzdF9vZignLicpOw0KICBpZiAoZG90UG9z
ID49IDApIHsNCiAgICAvLyBUaGUgY2hhcmFjdGVyIHNldCBzdHJpbmcsIGlmIHNwZWNpZmllZCwg
c3RhcnRzIEFGVEVSICB0aGUgJy4nLg0KICAgIC8vIElmIE5PVCBzcGVjaWZpZWQsIHJldHVybiB0
aGUgaW5wdXQgZGVmYXVsdC4NCiAgICBzdHJpbmcgcGFnZSA9IGN5Z19sb2NhbGUuc3Vic3RyKCsr
ZG90UG9zKTsNCiAgICBpZiAoMCA8PSAoc2hlbGxfY3AgPSBjeWdfY29kZXBhZ2Vfc3RyaW5nX3Rv
X0NQKHBhZ2UpKSkgew0KICAgICAgcmV0dXJuIHNoZWxsX2NwOw0KICAgIH0gIC8vIGVuZCBTSEVM
TF9DUA0KICB9ICAgIC8vIGVuZCBFUVBPUw0KICByZXR1cm4gZGVmYXVsdF9jcDsNCn0NCg0KDQpM
UFNUUiBfX3N0ZGNhbGwgVW5pY29kZVRvTUJ5dGVIZWxwZXIoTFBTVFIgbHBhLCBpbnQgbkJ5dGVz
LCBMUENXU1RSIGxwdywgaW50IG5DaGFycywgaW50IGNvZGVwYWdlKQ0Kew0KICBzdGF0aWMgaW50
IHByaW50SW5mbyA9IDA7DQogIGludCBuT3V0ID0gMDsNCg0KICBpZiAoTlVMTCA9PSBscGEpIHsN
CiAgICBwcmludGYoIk5VTEwgaW5wdXQgc3RyaW5nXG4iKTsNCiAgICByZXR1cm4gTlVMTDsNCiAg
fQ0KDQogIGlmIChwcmludEluZm8pIHsNCiAgICBwcmludGYoIlRyYW5zY29kaW5nIHVzaW5nIEN5
Z3dpbiBjb2RlcGFnZTogJWRcbklucHV0IHdpZGVjaGFyIHN0cmluZzpcbiIsIGNvZGVwYWdlKTsN
CiAgICBmb3IgKGludCBpID0gMDsgaSA8IG5DaGFyczsgaSsrKQ0KICAgICAgcHJpbnRmKCJcdGxw
d1slZF0gPSAlQyAtICUwMlhcbiIsIGksIGxwd1tpXSwgbHB3W2ldKTsNCiAgfQ0KICArK3ByaW50
SW5mbzsNCg0KICBpZiAobkNoYXJzID4gMCkgew0KICAgIGlmICgwID09IChuT3V0ID0gV2lkZUNo
YXJUb011bHRpQnl0ZShjb2RlcGFnZSwgMCwgbHB3LCBuQ2hhcnMsIGxwYSwgbkJ5dGVzLCBOVUxM
LCBOVUxMKSkpIHsNCiAgICAgIERXT1JEIGR3RXJyID0gR2V0TGFzdEVycm9yKCk7DQogICAgICBw
cmludGYoIldpZGVDaGFyVG9NdWx0aUJ5dGUoJWQsICVTKSBmYWlsZWQsIGVycm9yICVkXG4iLCBj
b2RlcGFnZSwgbHB3LCBkd0Vycik7DQogICAgICByZXR1cm4gTlVMTDsNCiAgICB9DQogIH0NCiAg
bHBhW25PdXRdID0gJ1wwJzsNCiAgcmV0dXJuIGxwYTsNCn0NCg0KaW50IHdtYWluKGludCBhcmdj
LCB3Y2hhcl90Kiogd2FyZ3YpDQp7DQogIHRyeSB7DQogICAgY2hhciAqcE51bGwgPSAiTlVMTCI7
DQogICAgY2hhcioqIGFyZ3YgPSBuZXcgY2hhcipbKGFyZ2MpKzFdOw0KICAgIGludCBfYXJnaTsN
CiAgICBpbnQgY29kZXBhZ2UgPSBnZXRfY3lnd2luX2NvZGVwYWdlKCk7DQogICAgZm9yIChfYXJn
aSA9IDA7IF9hcmdpIDwgKGFyZ2MpOyBfYXJnaSsrKSB7DQogICAgICBpZiAod2FyZ3ZbX2FyZ2ld
KSB7DQogICAgICAgIExQV1NUUiB1dGZfbHB3ICA9IHdhcmd2W19hcmdpXTsNCiAgICAgICAgaW50
IHV0Zl9sZW4gICAgID0gbHN0cmxlblcodXRmX2xwdyk7DQogICAgICAgIGludCB1dGZfY29udmVy
dCA9IHV0Zl9sZW4gKiAzICsgMTsNCiAgICAgICAgTFBTVFIgdXRmX2xwYSAgID0gKExQU1RSKV9h
bGxvY2EodXRmX2NvbnZlcnQpOw0KICAgICAgICBhcmd2W19hcmdpXSAgICAgPSBVbmljb2RlVG9N
Qnl0ZUhlbHBlcih1dGZfbHBhLCB1dGZfY29udmVydCwgdXRmX2xwdywgdXRmX2xlbiwgY29kZXBh
Z2UpOw0KICAgICAgfQ0KICAgICAgZWxzZSB7DQogICAgICAgIGFyZ3ZbX2FyZ2ldID0gcE51bGw7
DQogICAgICB9DQogICAgfQ0KICAgIGFyZ3ZbKGFyZ2MpXSA9IE5VTEw7DQoNCiAgICAvLyBOb3cg
cHJpbnQgdGhlIHRyYW5zY29kZWQgc3RyaW5nLg0KDQogICAgZm9yIChpbnQgaSA9IDE7IGkgPCBh
cmdjOyBpKyspDQogICAgICBwcmludGYoIiVzOiAlc1xuIiwgX19GVU5DVElPTl9fLCBhcmd2W2ld
KTsNCg0KICAgIHJldHVybiAwOw0KICB9DQogIGNhdGNoICguLi4pIHsNCiAgICBwcmludGYoIkNh
dWdodCB1bmhhbmRsZWQgZXhjZXB0aW9uXG4iKTsNCiAgfQ0KfQ0K

--=_mixed 0055B756852585B9_=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

--=_mixed 0055B756852585B9_=--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019