delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2022/02/03/03:54:12

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EAF24385841E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1643878452;
bh=cztIoRt/PHWQx/bHPyaGW3WvAz/RXXSw1zlq4XEoD3I=;
h=Date:From:To:Subject:References:In-Reply-To:List-Id:
List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe:
Reply-To:From;
b=F1nUlaLe3VVcJy5qY/hrgo1wENx6jS5cggik/EgJIUBmSvadLLwIIa3e8S58RTg6O
XJElP6y07FcLR1trFM04p3kFPaX8mIVmG2LDM62wTl8tn9grQ4aJVLm4P1303wLxlx
ixitTn9lChrwYiW92ImwDFX0mvtMd4N+9szOQPy0=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CF7D93858D37
Authentication-Results: sourceware.org;
dmarc=fail (p=none dis=none) header.from=cygwin.com
Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=cygwin.com
Date: Thu, 3 Feb 2022 09:53:01 +0100
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Removing ^X in paths
Message-ID: <YfuX7V8jCxGhnR/d@calimero.vinschen.de>
Mail-Followup-To: cygwin AT cygwin DOT com
References: <0255429a-409d-c17a-7b4d-8cbbfbea7255 AT ucar DOT edu>
<61FB3CA1 DOT 8000001 AT tlinx DOT org>
<214212b2-270b-ad62-837b-fb34697a2f33 AT ucar DOT edu>
MIME-Version: 1.0
In-Reply-To: <214212b2-270b-ad62-837b-fb34697a2f33@ucar.edu>
X-Provags-ID: V03:K1:EwmqirsuGRTrSodExz4+yWfjWMSn/eKlCgtO0+x4YYNT0C3J/c7
Dt9hgmWlDvxNYbfw2/OQi71B5m0GeS1Mblf8ElGpiNALV8/P/DdYuICOeN3Ax7rBIr3sLlH
oBRBogfoI3DCDuvR8OqWz5357kke5hoeSQHpYRlkzZ68Kbjp+zIYwUJKSE6dMM3Zw7esdGD
MFrV0vBvgnmbXgggRZEaA==
X-UI-Out-Filterresults: notjunk:1;V03:K0:hGGah+NPrm4=:h0rtfdjYDIhO9I/kBJHvfa
7yvWXuIcn/o1XhwP2BwOaGxUJmgpiKucjb2aC8zb/MG5FSF+28ktUDs2QPuy94WdtGH5jh5wF
Y7Tzz/3zsE7uaUcLNPvCtl+5Zkfzd7RwtJEVxdZjjYJpEhMzerh1MSdWz6k+aFjTd8uiJlDkZ
IF+gjLck/ZHDNHMZX0m3kj89OxWM9RuN17Kh+IyrHgaxAMIF9umsu47RDNy0LQqxcG8RjWO5P
fa7RKXWb1K19vnL041MWZ9uqLQyRq6uFu0tyosHfiHT0lK7GC8G7isS8rx0tQzDlp7uZcnbRo
h5WNhbplleSXy/WxpGVTFGYGGUTw8GraGpUv1h5oIefTX9SmmHJskrtCw0CSz9/88LqO7gjkq
tTsG/RJW/u3loHIoHjHHyHwNAleCkaBgBpVsli0pEOVx39aATBDmSKi1djNQ0fsoOqhFpYWvK
Wk+Wj6ZQKI4/RLc/zkFUOlvHpec0c9xsBdNJ7NNZVqSYE3D4GiOTNTU/KW9rrKNr+UdqzISwr
4yzlTXET0DUvTw7TEdLCFA/Kl+/sB1f/CNdKAdHyFGzjH5gA1Awibv8Do3XAdZnzlEqSbBbqX
8MriscoM+5sCzdpvbEdgZMAy4fGcpHoWylL5YHATC1aOHCfrLhW9k4XqhBnOlnZdgNpeisjH5
ag8w/8RbQ+WdNw2nLS/GDoeA0jFYJS1C3rrmptFhVVTm+x5rpDudC8MPzD7SYX+aZ4s12cQJz
8Xjn6l2g45y10ad7
X-Spam-Status: No, score=-97.5 required=5.0 tests=BAYES_00,
GOOD_FROM_CORINNA_CYGWIN, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE,
RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_FAIL, SPF_HELO_NONE, TXREP,
T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Reply-To: cygwin AT cygwin DOT com
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>

On Feb  2 21:12, Dennis Heimbigner wrote:
> I am using 64bit.
> And it has nothing to do misreading characters.
> 
> The ^X is described in this document:
> https://www.cygwin.com/cygwin-ug-net/using-specialnames.html,
> 
> There you will see this text:
> 
> "If you don't want or can't use UTF-8 as character set for
> whatever reason, you will nevertheless be able to access the
> file. How does that work? When Cygwin converts the filename from
> UTF-16 to your character set, it recognizes characters which
> can't be converted. If that occurs, Cygwin replaces the
> non-convertible character with a special character sequence. The
> sequence starts with an ASCII CAN character (hex code 0x18,
> equivalent Control-X), followed by the UTF-8 representation of
> the character. The result is a filename containing some ugly
> looking characters. While it doesn't look nice, it is nice,
> because Cygwin knows how to convert this filename back to
> UTF-16. The filename will be converted using your usual
> character set. However, when Cygwin recognizes an ASCII CAN
> character, it skips over the ASCII CAN and handles the following
> bytes as a UTF-8 character. Thus, the filename is symmetrically
> converted back to UTF-16 and you can access the file."
> 
> There is no obvious good reason to continue this convention.

You're probably using a non-UTF-8 locale, e. g., LANG=en_US using
ISO-8859-1 as charset.  See the output of `locale -av' to learn what
charset your locale uses.  Either way, converting the UTF-16 filenames
to a non-UTF charset is not lossless.  That's what the ASCII CAN stuff
is for.  If you want to avoid that, use a UTF-8 locale, e.g.
en_US.UTF-8.


Corinna

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019