| delorie.com/archives/browse.cgi | search |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| X-Original-To: | cygwin AT cygwin DOT com |
| Delivered-To: | cygwin AT cygwin DOT com |
| DMARC-Filter: | OpenDMARC Filter v1.3.2 sourceware.org BBEFD3857813 |
| Authentication-Results: | sourceware.org; |
| dmarc=none (p=none dis=none) header.from=tlinx.org | |
| Authentication-Results: | sourceware.org; spf=pass smtp.mailfrom=cygwin AT tlinx DOT org |
| Message-ID: | <600F2804.6000401@tlinx.org> |
| Date: | Mon, 25 Jan 2021 12:20:20 -0800 |
| From: | L A Walsh <cygwin AT tlinx DOT org> |
| User-Agent: | Thunderbird 2.0.0.24 (Windows/20100228) |
| MIME-Version: | 1.0 |
| To: | Ariel Burbaickij <ariel DOT burbaickij AT gmail DOT com> |
| Subject: | Re: switching to any other than English keyboard layout is not handled |
| correctly anymore on the prompt at minimum | |
| References: | <CANeJNHoujwZWP9kSKY7dLTUkFxeg3nTx=8bdgmdYqFrOSxmU7g AT mail DOT gmail DOT com> |
| <20210125222916 DOT b1fa2ddfb60088112f17eb2c AT nifty DOT ne DOT jp> | |
| <CANeJNHqY5eVF0R68znhc=rYu8oAcnSKbarMpt_CksgXN5_s5Dw AT mail DOT gmail DOT com> | |
| In-Reply-To: | <CANeJNHqY5eVF0R68znhc=rYu8oAcnSKbarMpt_CksgXN5_s5Dw@mail.gmail.com> |
| X-Spam-Status: | No, score=-2.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, |
| SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 | |
| X-Spam-Checker-Version: | SpamAssassin 3.4.2 (2018-09-13) on |
| server2.sourceware.org | |
| X-BeenThere: | cygwin AT cygwin DOT com |
| X-Mailman-Version: | 2.1.29 |
| List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
| List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
| List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
| Cc: | cygwin AT cygwin DOT com |
| Sender: | "Cygwin" <cygwin-bounces AT cygwin DOT com> |
On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote:
> It says following:
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_ALL=
>
> but why would it matter in the scenario where the user switches the layout
> explicitly him-/herself?
>
----
Because the OS (the keyboard driver) needs to know what mapping
is used on the keyboard, so that when you press a key,
the keyboard driver sends the keycode with the correct meaning to
programs.
The keys on your keyboard, _inherently_ have no meaning. They have
an "assigned" meaning as assigned by the locale settings so they can
send those characters to a program.
If you create your own layout, you need to create a *custom*
mapping in POSIX. Cygwin just uses the POSIX standard, it doesn't
create the mapping or the meanings.
(what cygwin uses -- cygwin didn't create its own system, it uses
the POSIX standard).
> On Mon, 25 Jan 2021 13:46:48 +0100
> Ariel Burbaickij wrote:
>
>> Hello Cygwin,
>> I tried to find some files from the command line prompt which are
>> named using various non-Latin (Russian, Hebrew, Arabic) and
>> non-default Latin (German) layouts under Windows 10 Enterprise using
>> recent cygwin version and the outcome is that instead of representing
>> letters I see control characters of the type: \263\320\321 (Unicode
>> numeric value of the letters?). Any ideas what happens here and how
>> correct functionality can be restored?
>>
---
Note that the characters you type are 1 thing. How a program
interprets those characters is by using the "locale" settings.
The locale is using UTF-8. So you need to set your terminal
to interpret unicode. I don't know much about Win10, but in the Microsoft
cmd.exe prog, "chcp" changes the code page. The code page for UTF-8 is
65001, so in such a terminal you could type:
chcp<Enter> # this should say something like:
Active code page: 801 # your number may be different
# Remember it to switch back to your initial code page (or just
# close the cmd window).
To switch to UTF-8, type:
chcp 65001
That will interpret output as UTF-8 in that program.
Note, I'm not sure that will be all of your problems.
"\263" is not valid for the 1st byte of a UTF-8 string. Valid
First bytes of a single UTF-8 char (in hex):
00-7f, c2-cf, d0-df, e0-ef, f0-f4.
So if you see something like 0xb3 in the 1st byte of a unicode
character, you know it can't exist (part of UTF-8's
self-synchronizing feature).
A very useful utility for displaying all unicode characters
and what character sets you have that can display them can be
found at:
https://www.babelstone.co.uk/Software/BabelMap.html
Unzip it into a folder and put a link to it where it is
easy to access.
Hope this helps.
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
| webmaster | delorie software privacy |
| Copyright © 2019 by DJ Delorie | Updated Jul 2019 |