delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2017/07/27/17:38:04

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type; q=dns; s=default; b=b/nh
BiTF/h3nZ6cgnh/HyLiqFHjv0O3f1nFUaSeL7pB5DmfRaGAn6uIYN725f7RyVnm6
17mhb0+3i251pT5nqexqzg3plCagko0E2MO3kfAV+xDH9cCSzrcZQLPOmGPkq/P/
x+voDKX4PdPBYH1iYNWEqfLEfLdfFmPaRq11KHA=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:subject:to:references:from:message-id:date
:mime-version:in-reply-to:content-type; s=default; bh=nsC59vaKmq
NZstC3OMPtMbTOD9g=; b=a+mkPkho4zlOc4kwYGVw4c0DnFRijx0/tJD6dQw/qh
+cLCiap5ume7x9AkUzvYu75f6UJbv3mk8gWKKDQTLYigPwB6sqOWVOzQjjPp7aJ5
YG5UajMayVYMLA5ZJjNX4ZsZtWZeb6vCZ8TZ9jPzjvB8fmN+grL1wsBQjlYKD/t1
4=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=transcription, SMALL, overwhelmed
X-HELO: mx1.redhat.com
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 12BDCEB2F6
Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx03.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=eblake AT redhat DOT com
Subject: Re: [ANNOUNCEMENT] Updated: libreadline7-7.0.3-3
To: cygwin AT cygwin DOT com
References: <5dbbf0e4-6374-a9bb-21e5-dd5537e0e19a AT redhat DOT com> <597a3771 DOT 4305ca0a DOT 32253 DOT e788 AT mx DOT google DOT com>
From: Eric Blake <eblake AT redhat DOT com>
Openpgp: url=http://people.redhat.com/eblake/eblake.gpg
Message-ID: <e8a9e633-586c-2ce4-3e89-41e020f0e923@redhat.com>
Date: Thu, 27 Jul 2017 16:37:45 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <597a3771.4305ca0a.32253.e788@mx.google.com>
X-IsSubscribed: yes

--JpJqicUqdpA8c6bO4NpXMC3xQcb0Ju1lF
Content-Type: multipart/mixed; boundary="b4QhtDrbUXEqKuh1dKHO6gHWss1HwThX7";
 protected-headers="v1"
From: Eric Blake <eblake AT redhat DOT com>
To: cygwin AT cygwin DOT com
Message-ID: <e8a9e633-586c-2ce4-3e89-41e020f0e923 AT redhat DOT com>
Subject: Re: [ANNOUNCEMENT] Updated: libreadline7-7.0.3-3
References: <5dbbf0e4-6374-a9bb-21e5-dd5537e0e19a AT redhat DOT com>
 <597a3771 DOT 4305ca0a DOT 32253 DOT e788 AT mx DOT google DOT com>
In-Reply-To: <597a3771 DOT 4305ca0a DOT 32253 DOT e788 AT mx DOT google DOT com>


--b4QhtDrbUXEqKuh1dKHO6gHWss1HwThX7
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable

On 07/27/2017 01:56 PM, Steven Penny wrote:
> On Thu, 27 Jul 2017 12:08:53, Eric Blake wrote:
>> I've got some time today to look at building readline, but for the life
>> of me, I can't figure out what I'm supposed to be debugging.  You have
>> so many emails saying "see this earlier URL" that I am lost in what you
>> are saying is wrong or how to reproduce it.
>=20
> Thanks for this. Between your 2 emails, youve put a lot on the table.
> Instead
> of getting overwhelmed, I will just start my side of the convo by
> replaying the
> problem. Then if you need more from me I am happy to help. So, here is an
> example problem using LATIN SMALL LETTER O WITH DIAERESIS' (U+00F6):
>=20
>    $ chcp.com 65001

I still don't know your environment (it's really hard to reproduce
issues if I don't know the steps to reproduce them).  This looks like a
bash prompt, but are you running bash inside mintty, or directly in a
cmd window?

When I first open a mintty window to get bash, I see:

$ chcp.com
Active code page: 437

and in that environment, typing <alt-1-4-8> displays nothing, but
hitting <enter> then displays:
-bash: $'\302\224': command not found

which maps to \xc2\x94; I can confirm that with 'od -tx1'.  Trying
<alt-0-2-4-6> gives a different character (=C2=A6), as \xc2\xa6.

When I then do

$chcp.com 65001
Active code page 65001

I don't see any change in behavior.

But if I first open a cmd window, with NO bash in the mix, I see:

c:\cygwin\bin> chcp
Active code page: 437

where both <alt-1-4-8> and <alt-0-2-4-6> output =C3=B6, and where 'od -tx1'
confirms both sequences produce \xc3\xb6.

Then switching code pages:

c:\cygwin\bin> chcp 65001
Active code page: 65001

directly typing <alt-0-2-4-6> prints nothing, while 'od -tx1' still
shows that it received \xc3\xb6.

I have no idea how alt- sequences are mapped to code points (it is not
as trivial as a conversion of base to get either the Unicode code-point
of 0x96 or to the UTF-8 encoding), but it appears that the input within
cmd is the same, while the choice of code page determines what the
output will be.  I also have no idea why the alt- sequences produce
different inputs under cmd than under mintty.  So knowing WHAT
environment you are using is VITAL to me understanding the results you
are seeing.

At any rate, I definitely know that U+00F6 is encoded as \xc3\xb6 in
UTF-8 (I confirmed that on Linux, with echo $'\xc3\xb6').  I _don't_
know what it is encoded as in Windows code page 437 or 65001.  But a
quick google later, and I see that for code page 437
(https://en.wikipedia.org/wiki/Code_page_437), =C3=B6 is at codepoint 0x94
(decimal 148, octal 0224); meanwhile, 0xf6 is equal to decimal 246.  Aha
- maybe that explains the two alt- sequences under codepage 437: without
a leading zero, you are typing the decimal position which looks up the
character from the current code page; WITH a leading zero you are
directly requesting the decimal encoding of a Unicode character.  And
trying some other sequences, I note that =C3=B5 (LATIN SMALL LETTER O WITH
TILDE' (U+00F5)) is not part of code page 437; so there is nothing I can
type without a leading 0 to print one; conversely, trying <alt-0-2-4-5>
which requests the same unicode character displays merely 'o'
(apparently U+006f), which, when you lack o-with-tilde, is a reasonable
fallback compared to printing nothing at all.

Either way, the character requested by the alt-sequence in the cmd
window is then transformed by Cygwin into the appropriate UTF-8 input
for the tty stdin of the Cygwin child process.  Hmm; repeating those
sequences under 'od -tx1', when I try <alt-0-2-4-5>, I see something
interesting: the moment I press 5 (while still holding alt), the display
prints [G; then releasing alt prints o; the transcription is then

0000000 1b 1b 5b 47 c3 b5 0a

which is ESC ESC [ G (hmm - that's the ANSI terminal escape sequence for
moving to column 0), followed by the actual Unicode =C3=B5, before my ending
newline.  No idea why that is leaking through to Cygwin to pick up as
input.  Is windows trying to beep at me to tell me my Unicode request
doesn't exist in the current code page?  Except that beep is Ctrl-G
(U+0007).

But when I switch to code page 65001, wikipedia redirects me to UTF-8.
So in that code page, presumably all ALT sequences represent themselves,
whether or not there is a leading 0?  No, experimentation shows
otherwise: <alt-2> shows nothing (and not the smiley face from codepage
437); while <alt-0-2> shows ^B (where ctrl-b really is code point 2). I
have no idea WHAT sequence would thus give you =C3=B6.


> Now you might say, why not just use codepage 437? Which is exactly what
> Corinna
> did say:
>=20
> http://cygwin.com/ml/cygwin/2017-03/msg00193.html

Well, obviously, the code page matters to cmd; and I have no idea what
alt- sequences do (or are supposed to do) under mintty.  So there may
STILL be some lingering craziness on what Cygwin itself should do when
it recognizes an alt- sequence coming in (if cygwin translates from the
current code page to Unicode, where the current code page definitely
affects which character is desired); and that's _in addition_ to what
appears to be the craziness in bash when reconstructing the UTF-8
sequence for omega =CE=A9 as mentioned in my other mail.

--=20
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


--b4QhtDrbUXEqKuh1dKHO6gHWss1HwThX7--

--JpJqicUqdpA8c6bO4NpXMC3xQcb0Ju1lF
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Comment: Public key at http://people.redhat.com/eblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEzBAEBCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAll6XSkACgkQp6FrSiUn
Q2o8wQgAmmEzVhc4X2WZQDJdnxwI522X6WAcuqk28ueBr1jxpdZZYFbf5+wvhnru
KtOppdaZ8s7UbglE/3GPxpUavPNwF/Oiq6Lm7n2w09BhqYb9pmU6/3V/G4t0mP5b
oKE+rB6dgt079Vn+GPD2UpXNLROJlQOihfB/9YOHKnpus0j3FcHUPf4p5dAWCBE6
6pxmieEFJk2n1FqAtyxSP1sthVf4ySK1s57Rmo2dqc3XQGh3JSu6lu8AGT2F3MSQ
JiYQ2Csv4uyu4SoT//mZHT2SnIMHV3z54yyLXw+6a0Hy5xhw2DfNFDziEqljlZeg
+qyYIP4s8ck3sAC8AcnfIHz+16lk5w==
=tab3
-----END PGP SIGNATURE-----

--JpJqicUqdpA8c6bO4NpXMC3xQcb0Ju1lF--

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019