delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2021/01/25/15:22:40

X-Recipient: archive-cygwin AT delorie DOT com
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BBEFD3857813
Authentication-Results: sourceware.org;
dmarc=none (p=none dis=none) header.from=tlinx.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cygwin AT tlinx DOT org
Message-ID: <600F2804.6000401@tlinx.org>
Date: Mon, 25 Jan 2021 12:20:20 -0800
From: L A Walsh <cygwin AT tlinx DOT org>
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: Ariel Burbaickij <ariel DOT burbaickij AT gmail DOT com>
Subject: Re: switching to any other than English keyboard layout is not handled
correctly anymore on the prompt at minimum
References: <CANeJNHoujwZWP9kSKY7dLTUkFxeg3nTx=8bdgmdYqFrOSxmU7g AT mail DOT gmail DOT com>
<20210125222916 DOT b1fa2ddfb60088112f17eb2c AT nifty DOT ne DOT jp>
<CANeJNHqY5eVF0R68znhc=rYu8oAcnSKbarMpt_CksgXN5_s5Dw AT mail DOT gmail DOT com>
In-Reply-To: <CANeJNHqY5eVF0R68znhc=rYu8oAcnSKbarMpt_CksgXN5_s5Dw@mail.gmail.com>
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Cc: cygwin AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>

On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote:
> It says following:
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_ALL=
>
> but why would it matter in the scenario where the user switches the layout
> explicitly him-/herself?
>   
----
    Because the OS (the keyboard driver) needs to know what mapping
is used on the keyboard, so that when you press a key,
the keyboard driver sends the keycode with the correct meaning to
programs.

    The keys on your keyboard, _inherently_ have no meaning.  They have
an "assigned" meaning as assigned by the locale settings so they can
send those characters to a program.

    If you create your own layout, you need to create a *custom*
mapping in POSIX.  Cygwin just uses the POSIX standard, it doesn't
create the mapping or the meanings.

 (what cygwin uses -- cygwin didn't create its own system, it uses
the POSIX standard).
> On Mon, 25 Jan 2021 13:46:48 +0100
> Ariel Burbaickij wrote:
>   
>> Hello Cygwin,
>> I tried to find some files from the command line prompt which are
>> named using various non-Latin (Russian, Hebrew, Arabic) and
>> non-default Latin (German) layouts under Windows 10 Enterprise using
>> recent cygwin version and the outcome is that instead of representing
>> letters I see control characters of the type: \263\320\321  (Unicode
>> numeric value of the letters?). Any ideas what happens here and how
>> correct functionality can be restored?
>>     
---
    Note that the characters you type are 1 thing.  How a program
interprets those characters is by using the "locale" settings.

    The locale is using UTF-8.  So you need to set your terminal
to interpret unicode.  I don't know much about Win10, but in the Microsoft
cmd.exe prog, "chcp" changes the code page.  The code page for UTF-8 is
65001, so in such a terminal you could type:

chcp<Enter>                # this should say something like:
Active code page: 801      # your number may be different

# Remember it to switch back to your initial code page (or just
#  close the cmd window).

To switch to UTF-8, type:

chcp 65001

That will interpret output as UTF-8 in that program.

Note, I'm not sure that will be all of your problems.
"\263" is not valid for the 1st byte of a UTF-8 string. Valid
First bytes of a single UTF-8 char (in hex):
00-7f, c2-cf, d0-df, e0-ef, f0-f4.
So if you see something like 0xb3 in the 1st byte of a unicode
character, you know it can't exist (part of UTF-8's
self-synchronizing feature).

A very useful utility for displaying all unicode characters
and what character sets you have that can display them can be
found at:

https://www.babelstone.co.uk/Software/BabelMap.html

Unzip it into a folder and put a link to it where it is
easy to access.


Hope this helps.

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019