delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
X-Original-To: | cygwin AT cygwin DOT com |
Delivered-To: | cygwin AT cygwin DOT com |
DMARC-Filter: | OpenDMARC Filter v1.4.1 sourceware.org C49A43858D37 |
Authentication-Results: | sourceware.org; dmarc=none (p=none dis=none) |
header.from=SystematicSw.ab.ca | |
Authentication-Results: | sourceware.org; |
spf=none smtp.mailfrom=systematicsw.ab.ca | |
X-Authority-Analysis: | v=2.4 cv=S9vKfagP c=1 sm=1 tr=0 ts=61fb636f |
a=T+ovY1NZ+FAi/xYICV7Bgg==:117 a=T+ovY1NZ+FAi/xYICV7Bgg==:17 | |
a=IkcTkHD0fZMA:10 a=w_pzkKWiAAAA:8 a=pd2efzlo9QPagLvpf-EA:9 a=QEXdDO2ut3YA:10 | |
a=tvFtSJD_1wcA:10 a=pCk8otp2WLYA:10 a=sRI3_1zDfAgwuvI8zelB:22 | |
Message-ID: | <4eda7d19-d469-6819-36ba-7116f630be42@SystematicSw.ab.ca> |
Date: | Wed, 2 Feb 2022 22:09:03 -0700 |
MIME-Version: | 1.0 |
User-Agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 |
Thunderbird/91.5.1 | |
Subject: | Re: Removing ^X in paths |
To: | cygwin AT cygwin DOT com |
References: | <0255429a-409d-c17a-7b4d-8cbbfbea7255 AT ucar DOT edu> |
<61FB3CA1 DOT 8000001 AT tlinx DOT org> <214212b2-270b-ad62-837b-fb34697a2f33 AT ucar DOT edu> | |
From: | Brian Inglis <Brian DOT Inglis AT SystematicSw DOT ab DOT ca> |
Organization: | Systematic Software |
In-Reply-To: | <214212b2-270b-ad62-837b-fb34697a2f33@ucar.edu> |
X-CMAE-Envelope: | MS4xfHWOud03SHxv3z1h2wcdrGJZW9APmhvPESpA+Njxu9elS7Krw0tCAVCsNdNXjui5knPXAbHrXzxMN6lEBRdoe24jbeIUP0ml4NqcbUCK101a5NMRgGZG |
OJqQO7wG/U6ytTltugRl9aMjMGpxVM46i2wQ8JnVYnGjLdyulHk2uTaF0RkV504Rh3C/O5cBhK2MdkE0/EcNKWSS2zL+QfLi12E= | |
X-Spam-Status: | No, score=-1160.7 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, |
KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, RCVD_IN_BARRACUDACENTRAL, | |
RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, | |
T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4 | |
X-Spam-Checker-Version: | SpamAssassin 3.4.4 (2020-01-24) on |
server2.sourceware.org | |
X-BeenThere: | cygwin AT cygwin DOT com |
X-Mailman-Version: | 2.1.29 |
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
Reply-To: | cygwin AT cygwin DOT com |
Errors-To: | cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com |
Sender: | "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com> |
X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 21359NFk009181 |
On 2022-02-02 21:12, Dennis Heimbigner wrote: > On 2/2/2022 7:23 PM, L A Walsh wrote: >> On 2022/02/02 12:40, Dennis Heimbigner wrote: >>> It appears that windows now supports the UTF-8 codepage. >> It has since early 2000's. >>> I light of this, it seems time to change cygwin so it no longer adds >>> those >>> control-x (^X) characters in e.g. path names. >> ^x is ASCII. Cygwin doesn't insert ^X characters in paths. >> Perhaps you are thinking of '\' which looks like ¥ (a capital 'Y' >> with 2 horizontal lines, (Fullwidth Yen Sign U+FFE5)...if that's the >> case, some 8-bit font >> displayed that sign instead of a backslash in non-unicode locals. >> Are you using a 32-bit or 64-bit version of Cygwin? on what version >> of windows? >> If you still use a 32-bit version, you might need to move to a 64-bit >> version. >> I know the 32-bit version sometimes had the problem because it supported >> fewer fonts and fewer characters at the same time. >> You might check out your locale (if in english, try setting: >> LC_CTYPE="en_US.UTF-8" >> in your shell and also check that your used font has a backslash in the >> 0x7f position. >> But in shell, ^x is usually a character to erase the whole line -- so >> it really >> wouldn't do to have it in a PATH. >> Hope this helps, and sorry if this is completely off base. > I am using 64bit. > And it has nothing to do misreading characters. > The ^X is described in this document: > https://www.cygwin.com/cygwin-ug-net/using-specialnames.html, > There you will see this text: > "If you don't want or can't use UTF-8 as character set for > whatever reason, you will nevertheless be able to access the > file. How does that work? When Cygwin converts the filename from > UTF-16 to your character set, it recognizes characters which > can't be converted. If that occurs, Cygwin replaces the > non-convertible character with a special character sequence. The > sequence starts with an ASCII CAN character (hex code 0x18, > equivalent Control-X), followed by the UTF-8 representation of > the character. The result is a filename containing some ugly > looking characters. While it doesn't look nice, it is nice, > because Cygwin knows how to convert this filename back to > UTF-16. The filename will be converted using your usual > character set. However, when Cygwin recognizes an ASCII CAN > character, it skips over the ASCII CAN and handles the following > bytes as a UTF-8 character. Thus, the filename is symmetrically > converted back to UTF-16 and you can access the file." > There is no obvious good reason to continue this convention. This is not a convention, it is an interoperability feature, to allow unsupported characters to be used in filenames, otherwise Cygwin would have to fail the file open in locales where those characters are unsupported. I have always used ASCII, ISO-8859-1/15, or UTF-8 and have never seen a ^X in any filename, although I have produced many other control and special characters in filenames by error. ;^> If you never use a limited character set locale with filenames using extended character sets you will never see this either. This feature is for those who may be importing files with names in extended character sets but their selected locale only supports a limited character set. Some users and nationalities still prefer to use locales with limited character sets, perhaps because their important apps still use them, and they are familiar with the related keyboard mappings and font glyphs. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada This email may be disturbing to some readers as it contains too much technical detail. Reader discretion is advised. [Data in binary units and prefixes, physical quantities in SI.] -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |