X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:references:to:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=ilTgRlb9lqua8QqA nRXaSU9aYEXUQrPUTqI0p60nmRxRQVgdM1TkvELtPvJFNTbYtOD5t1xgxhASVDqU mbDzBgMv60cMbgkRjqWo4/eYjN/8sQV36fk8P+M5ZZqskF7q47bxXN7ApkGwRODb lt75uxTK7M/BP2StoY8XmqtdCwk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:references:to:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=IP58wUBG+4gud/LeFB8b4f wigdQ=; b=rRKjPpJdtM4IJfappiM0Eg9wfSaqTElq3Su1VhKeBiFL5UZyGmlmK7 2GiwMRhqFOKzLQ5cG21DekKix2GN5wfZCkxbS9RG+J+N6HOBBEgOuaaGwczKgrsp jAaDHsp16hzxtgOZ7n5hDtgAmqCXPleSkax3c3KHDGR7/CCXrV3Dc= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2 spammy=Euro, dash, UD:k.a, HX-Received:10.55.108.7 X-HELO: mail-qt0-f178.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:references:to:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=OA7cDA/zSS3ZK7HQj0L2DE7APzbPAwSvCNx3PzpMwW8=; b=O9L4NUuwl/8FHMFGRc9FM7h+zG1lU1ato37UVjoM+TB9oLNnE7PJu0pBq3MxT3qBr2 +oz0Yf5kdBBjidKvXM2w4LcpJlI39y7MhyS82vtmHbZzKqddFZFeXLR0SSKwQLc5pKVa RRbBS5D89QRiWeJZAj2KNI46AKuBmNmeVJLXjVBBw8stC2kG7Y5GOGoV2S+U0xfGyLJ9 YNTgNKTuen/HpqfUXapi5plpXBvR8xDJuq651m7KJ8j2InMDuQcuV4Vso/oSgzrMZgcY Qil1UvosPW7+zCFUea+6X4vTUX/lqyD/O9FMCZVB/0gLr8YE0k1lKJmfvxRUqwVCnsS5 b7rQ== X-Gm-Message-State: AKGB3mLdJuxYWUCpoT4iKp4ukvCevgaRbdS7/cB2yqIKOrcIWV/UP+Ji gRFWkNp+yiwK+xYSKd75cHE= X-Google-Smtp-Source: ACJfBou273Jbt1rzXOweQZmbvREunoBxoaMbCOCr9uiIHmJHCNeJC/SrKYfbpk3PcU7BtEp52C0lwg== X-Received: by 10.55.108.7 with SMTP id h7mr3079475qkc.111.1513035371132; Mon, 11 Dec 2017 15:36:11 -0800 (PST) Subject: Re: Need help with multibyte UTF-8 characters References: <626a3c06-e9f2-1932-f1f3-47ddb2051215 AT gmail DOT com> To: cygwin AT cygwin DOT com From: Thomas Taylor Message-ID: <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com> Date: Mon, 11 Dec 2017 18:36:09 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <626a3c06-e9f2-1932-f1f3-47ddb2051215@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Thank you for your advice on setting my locale to en_US.UTF-8.  Unfortunately, Cygwin still seems to have trouble displaying some three-byte UTF-8 encoded characters correctly.  For example, see the following snippet from a "sed" file.  This file attempts to convert XML-encoded filenames to UTF-8.  As you can see, it converts one- and two-byte encodings correctly, but fails on some three-byte encodings (the en dash, the em dash, and the ellipsis, all of which are displayed as a filled-in rectangle): # Match longest strings first # Three-byte encodings: # En dash s/%[Ee]2%80%93/–/g # Em dash s/%[Ee]2%80%94/—/g # Horizontal ellipsis s/%[Ee]2%80%[Aa]6/…/g # Less-than-or-equal sign s/%[Ee]2%89%[Aa]4/≤/g # Euro symbol s/%[Ee]2%82%[Aa][Cc]/€/g # Two-byte encodings: # Non-break space #s/%[Cc]2%[Aa]0/⎵/g # Lowercase a with acute accent s/%[Cc]3%[Aa]1/á/g # Lowercase a with umlaut (a.k.a. diaeresis) s/%[Cc]3%[Aa]4/ä/g # Lowercase e with acute accent s/%[Cc]3%[Aa]9/é/g # Lowercase i with acute accent s/%[Cc]3%[Aa]D/í/g # Lowercase o with acute accent s/%[Cc]3%[Bb]3/ó/g -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple