X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EAF24385841E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1643878452; bh=cztIoRt/PHWQx/bHPyaGW3WvAz/RXXSw1zlq4XEoD3I=; h=Date:From:To:Subject:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: Reply-To:From; b=F1nUlaLe3VVcJy5qY/hrgo1wENx6jS5cggik/EgJIUBmSvadLLwIIa3e8S58RTg6O XJElP6y07FcLR1trFM04p3kFPaX8mIVmG2LDM62wTl8tn9grQ4aJVLm4P1303wLxlx ixitTn9lChrwYiW92ImwDFX0mvtMd4N+9szOQPy0= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CF7D93858D37 Authentication-Results: sourceware.org; dmarc=fail (p=none dis=none) header.from=cygwin.com Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=cygwin.com Date: Thu, 3 Feb 2022 09:53:01 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: Removing ^X in paths Message-ID: Mail-Followup-To: cygwin AT cygwin DOT com References: <0255429a-409d-c17a-7b4d-8cbbfbea7255 AT ucar DOT edu> <61FB3CA1 DOT 8000001 AT tlinx DOT org> <214212b2-270b-ad62-837b-fb34697a2f33 AT ucar DOT edu> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <214212b2-270b-ad62-837b-fb34697a2f33@ucar.edu> X-Provags-ID: V03:K1:EwmqirsuGRTrSodExz4+yWfjWMSn/eKlCgtO0+x4YYNT0C3J/c7 Dt9hgmWlDvxNYbfw2/OQi71B5m0GeS1Mblf8ElGpiNALV8/P/DdYuICOeN3Ax7rBIr3sLlH oBRBogfoI3DCDuvR8OqWz5357kke5hoeSQHpYRlkzZ68Kbjp+zIYwUJKSE6dMM3Zw7esdGD MFrV0vBvgnmbXgggRZEaA== X-UI-Out-Filterresults: notjunk:1;V03:K0:hGGah+NPrm4=:h0rtfdjYDIhO9I/kBJHvfa 7yvWXuIcn/o1XhwP2BwOaGxUJmgpiKucjb2aC8zb/MG5FSF+28ktUDs2QPuy94WdtGH5jh5wF Y7Tzz/3zsE7uaUcLNPvCtl+5Zkfzd7RwtJEVxdZjjYJpEhMzerh1MSdWz6k+aFjTd8uiJlDkZ IF+gjLck/ZHDNHMZX0m3kj89OxWM9RuN17Kh+IyrHgaxAMIF9umsu47RDNy0LQqxcG8RjWO5P fa7RKXWb1K19vnL041MWZ9uqLQyRq6uFu0tyosHfiHT0lK7GC8G7isS8rx0tQzDlp7uZcnbRo h5WNhbplleSXy/WxpGVTFGYGGUTw8GraGpUv1h5oIefTX9SmmHJskrtCw0CSz9/88LqO7gjkq tTsG/RJW/u3loHIoHjHHyHwNAleCkaBgBpVsli0pEOVx39aATBDmSKi1djNQ0fsoOqhFpYWvK Wk+Wj6ZQKI4/RLc/zkFUOlvHpec0c9xsBdNJ7NNZVqSYE3D4GiOTNTU/KW9rrKNr+UdqzISwr 4yzlTXET0DUvTw7TEdLCFA/Kl+/sB1f/CNdKAdHyFGzjH5gA1Awibv8Do3XAdZnzlEqSbBbqX 8MriscoM+5sCzdpvbEdgZMAy4fGcpHoWylL5YHATC1aOHCfrLhW9k4XqhBnOlnZdgNpeisjH5 ag8w/8RbQ+WdNw2nLS/GDoeA0jFYJS1C3rrmptFhVVTm+x5rpDudC8MPzD7SYX+aZ4s12cQJz 8Xjn6l2g45y10ad7 X-Spam-Status: No, score=-97.5 required=5.0 tests=BAYES_00, GOOD_FROM_CORINNA_CYGWIN, KAM_DMARC_NONE, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_FAIL, SPF_HELO_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.29 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: cygwin AT cygwin DOT com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" On Feb 2 21:12, Dennis Heimbigner wrote: > I am using 64bit. > And it has nothing to do misreading characters. > > The ^X is described in this document: > https://www.cygwin.com/cygwin-ug-net/using-specialnames.html, > > There you will see this text: > > "If you don't want or can't use UTF-8 as character set for > whatever reason, you will nevertheless be able to access the > file. How does that work? When Cygwin converts the filename from > UTF-16 to your character set, it recognizes characters which > can't be converted. If that occurs, Cygwin replaces the > non-convertible character with a special character sequence. The > sequence starts with an ASCII CAN character (hex code 0x18, > equivalent Control-X), followed by the UTF-8 representation of > the character. The result is a filename containing some ugly > looking characters. While it doesn't look nice, it is nice, > because Cygwin knows how to convert this filename back to > UTF-16. The filename will be converted using your usual > character set. However, when Cygwin recognizes an ASCII CAN > character, it skips over the ASCII CAN and handles the following > bytes as a UTF-8 character. Thus, the filename is symmetrically > converted back to UTF-16 and you can access the file." > > There is no obvious good reason to continue this convention. You're probably using a non-UTF-8 locale, e. g., LANG=en_US using ISO-8859-1 as charset. See the output of `locale -av' to learn what charset your locale uses. Either way, converting the UTF-16 filenames to a non-UTF charset is not lossless. That's what the ASCII CAN stuff is for. If you want to avoid that, use a UTF-8 locale, e.g. en_US.UTF-8. Corinna -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple