X-Recipient: archive-cygwin AT delorie DOT com X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org BBEFD3857813 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=tlinx.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cygwin AT tlinx DOT org Message-ID: <600F2804.6000401@tlinx.org> Date: Mon, 25 Jan 2021 12:20:20 -0800 From: L A Walsh User-Agent: Thunderbird 2.0.0.24 (Windows/20100228) MIME-Version: 1.0 To: Ariel Burbaickij Subject: Re: switching to any other than English keyboard layout is not handled correctly anymore on the prompt at minimum References: <20210125222916 DOT b1fa2ddfb60088112f17eb2c AT nifty DOT ne DOT jp> In-Reply-To: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 List-Id: General Cygwin discussions and problem reports List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cygwin AT cygwin DOT com Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "Cygwin" On 2021/01/25 06:03, Ariel Burbaickij via Cygwin wrote: > It says following: > LANG=en_US.UTF-8 > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_ALL= > > but why would it matter in the scenario where the user switches the layout > explicitly him-/herself? > ---- Because the OS (the keyboard driver) needs to know what mapping is used on the keyboard, so that when you press a key, the keyboard driver sends the keycode with the correct meaning to programs. The keys on your keyboard, _inherently_ have no meaning. They have an "assigned" meaning as assigned by the locale settings so they can send those characters to a program. If you create your own layout, you need to create a *custom* mapping in POSIX. Cygwin just uses the POSIX standard, it doesn't create the mapping or the meanings. (what cygwin uses -- cygwin didn't create its own system, it uses the POSIX standard). > On Mon, 25 Jan 2021 13:46:48 +0100 > Ariel Burbaickij wrote: > >> Hello Cygwin, >> I tried to find some files from the command line prompt which are >> named using various non-Latin (Russian, Hebrew, Arabic) and >> non-default Latin (German) layouts under Windows 10 Enterprise using >> recent cygwin version and the outcome is that instead of representing >> letters I see control characters of the type: \263\320\321 (Unicode >> numeric value of the letters?). Any ideas what happens here and how >> correct functionality can be restored? >> --- Note that the characters you type are 1 thing. How a program interprets those characters is by using the "locale" settings. The locale is using UTF-8. So you need to set your terminal to interpret unicode. I don't know much about Win10, but in the Microsoft cmd.exe prog, "chcp" changes the code page. The code page for UTF-8 is 65001, so in such a terminal you could type: chcp # this should say something like: Active code page: 801 # your number may be different # Remember it to switch back to your initial code page (or just # close the cmd window). To switch to UTF-8, type: chcp 65001 That will interpret output as UTF-8 in that program. Note, I'm not sure that will be all of your problems. "\263" is not valid for the 1st byte of a UTF-8 string. Valid First bytes of a single UTF-8 char (in hex): 00-7f, c2-cf, d0-df, e0-ef, f0-f4. So if you see something like 0xb3 in the 1st byte of a unicode character, you know it can't exist (part of UTF-8's self-synchronizing feature). A very useful utility for displaying all unicode characters and what character sets you have that can display them can be found at: https://www.babelstone.co.uk/Software/BabelMap.html Unzip it into a folder and put a link to it where it is easy to access. Hope this helps. -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple