delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2021/01/25/15:52:29

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 903AA397181D
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1611607875;
bh=BdgAR873d/PSi3IFuwuxEoCdbfA+1mBZ3ji8VeSP168=;
h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=KhrZlAkMevKJsSseh08Yv0b2EJY42cSHG2A6M3HluFs0eRJXHXRJr3iKHa7tRmozP
mOr7qJy4clFL+QSe+nrJ+5OWPI2XFTbaMv96+dpYrLW+q3wsHwTmx+Mz8rUIlRBeKP
O3Gqx36/fbTq5MVO1hBfC9daaF1hw3xRt+APjVao=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org D8A7C385482A
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:references:in-reply-to:from:date
:message-id:subject:to:cc;
bh=uR0PN9GdBLeEbbXRrEp2ziIpULad+HumRTFA2mGwEFs=;
b=nLNehG+ZRfRv3DQv8cAlhQZ2E8Qcq+tDWCPjFnMPj9iBrinBC9Mu7gxUpwl2yczmX/
r/pppLMjfoVxBf1uXB/yLdgFjs0RgBcgK1LyO/8RD1fiLWJoomk/A2oMKdW0YyMnc2m4
6rugdn5VX2XtAbx+nfxoKlUZYh2ODgJ88RVR5iBH7ZoGJUaR2fH4QjFYVoEw1VCy2/2w
j24K1lpBm19fU4fldDIwrlJKN7dTJucRYCjEa/D3Yhum6+NtzC/xjHvMtDAwUVptVmx+
F8XHVss8G3z3CRaz7XCjPboOmQf+sS4hefEcZlO23+KHzzcr0BDE7MQHnmHJpvvRMRvI
UbpQ==
X-Gm-Message-State: AOAM531/jVRsglrQch1QfKzBI2x17sM6joDkdr8bqArjRXuTpIB01oxM
wbrJaARE/z/kBbp0uNoi8EhQLS1DxcAzwNF8ung7CvUaZhtzCw==
X-Google-Smtp-Source: ABdhPJxn+lCCGvLWujgwDYgDvgdgoffJ1G7Xm9u7hdd3FxtqtEecXpJzA3ymcoFYiyaRr7t4ju8Eh5yEhLpuUzbTfZY=
X-Received: by 2002:a17:907:961d:: with SMTP id
gb29mr1496462ejc.460.1611607871033;
Mon, 25 Jan 2021 12:51:11 -0800 (PST)
MIME-Version: 1.0
References: <CANeJNHoujwZWP9kSKY7dLTUkFxeg3nTx=8bdgmdYqFrOSxmU7g AT mail DOT gmail DOT com>
<20210125222916 DOT b1fa2ddfb60088112f17eb2c AT nifty DOT ne DOT jp>
<CANeJNHqY5eVF0R68znhc=rYu8oAcnSKbarMpt_CksgXN5_s5Dw AT mail DOT gmail DOT com>
<600F2804 DOT 6000401 AT tlinx DOT org>
In-Reply-To: <600F2804.6000401@tlinx.org>
Date: Mon, 25 Jan 2021 21:50:59 +0100
Message-ID: <CANeJNHrxyDg1q-XtqcBrt1qo8Lam1Nsb9ZBpOwRq7piRB=6Wig@mail.gmail.com>
Subject: Re: switching to any other than English keyboard layout is not
handled correctly anymore on the prompt at minimum
To: L A Walsh <cygwin AT tlinx DOT org>
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE,
RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Ariel Burbaickij via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Ariel Burbaickij <ariel DOT burbaickij AT gmail DOT com>
Cc: cygwin AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>

Wait a sec, what do you specifically mean with "... Cygwin just uses the
POSIX standard..." -- POSIX standard for what and how does it interfere
with getting the current layout and mapping from OS?
What do you also mean with "... So you need to set your terminal to
interpret unicode..." ? My terminal is Cygwin Terminal here. cmd.exe does
at least handle Russian and German just fine, not so Arabic and Hebrew but
this, I am pretty sure, because of some additional fiddling around
right-to-left writing needed. Notepad++(!) already handles all input types
just fine as do all the other programs tested so far. So, what are these
supposed big OS-side secrets specifically that cygwin cannot get to here?

Best Regards
Ariel Burbaickij


On Mon, Jan 25, 2021 at 9:21 PM L A Walsh <cygwin AT tlinx DOT org> wrote:

> On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote:
> > It says following:
> > LANG=en_US.UTF-8
> > LC_CTYPE="en_US.UTF-8"
> > LC_NUMERIC="en_US.UTF-8"
> > LC_TIME="en_US.UTF-8"
> > LC_COLLATE="en_US.UTF-8"
> > LC_MONETARY="en_US.UTF-8"
> > LC_MESSAGES="en_US.UTF-8"
> > LC_ALL=
> >
> > but why would it matter in the scenario where the user switches the
> layout
> > explicitly him-/herself?
> >
> ----
>     Because the OS (the keyboard driver) needs to know what mapping
> is used on the keyboard, so that when you press a key,
> the keyboard driver sends the keycode with the correct meaning to
> programs.
>
>     The keys on your keyboard, _inherently_ have no meaning.  They have
> an "assigned" meaning as assigned by the locale settings so they can
> send those characters to a program.
>
>     If you create your own layout, you need to create a *custom*
> mapping in POSIX.  Cygwin just uses the POSIX standard, it doesn't
> create the mapping or the meanings.
>
>  (what cygwin uses -- cygwin didn't create its own system, it uses
> the POSIX standard).
> > On Mon, 25 Jan 2021 13:46:48 +0100
> > Ariel Burbaickij wrote:
> >
> >> Hello Cygwin,
> >> I tried to find some files from the command line prompt which are
> >> named using various non-Latin (Russian, Hebrew, Arabic) and
> >> non-default Latin (German) layouts under Windows 10 Enterprise using
> >> recent cygwin version and the outcome is that instead of representing
> >> letters I see control characters of the type: \263\320\321  (Unicode
> >> numeric value of the letters?). Any ideas what happens here and how
> >> correct functionality can be restored?
> >>
> ---
>     Note that the characters you type are 1 thing.  How a program
> interprets those characters is by using the "locale" settings.
>
>     The locale is using UTF-8.  So you need to set your terminal
> to interpret unicode.  I don't know much about Win10, but in the Microsoft
> cmd.exe prog, "chcp" changes the code page.  The code page for UTF-8 is
> 65001, so in such a terminal you could type:
>
> chcp<Enter>                # this should say something like:
> Active code page: 801      # your number may be different
>
> # Remember it to switch back to your initial code page (or just
> #  close the cmd window).
>
> To switch to UTF-8, type:
>
> chcp 65001
>
> That will interpret output as UTF-8 in that program.
>
> Note, I'm not sure that will be all of your problems.
> "\263" is not valid for the 1st byte of a UTF-8 string. Valid
> First bytes of a single UTF-8 char (in hex):
> 00-7f, c2-cf, d0-df, e0-ef, f0-f4.
> So if you see something like 0xb3 in the 1st byte of a unicode
> character, you know it can't exist (part of UTF-8's
> self-synchronizing feature).
>
> A very useful utility for displaying all unicode characters
> and what character sets you have that can display them can be
> found at:
>
> https://www.babelstone.co.uk/Software/BabelMap.html
>
> Unzip it into a folder and put a link to it where it is
> easy to access.
>
>
> Hope this helps.
>
>
--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019