X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; q=dns; s=default; b=mcCNKks HkEnXLgsuzgY1qatLhl+jz+CbtSssQcoFFp9mzXcAc8GRPuiskPvlYjeoW76J7G3 fUZ7xhP45XNSnPwvbtU/iInALdYG7zLVJjcx37S8CW38jmUlSdMr6bOZhJkmp5SB fRbAoZ/QAxvwsskT6jGWE/v54gsoUGg+iIGI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; s=default; bh=6jYo5+pReIkod DLaD/vSixCsq3s=; b=mzjrsBNlpMBKt3Eu66sqi/qmtOruKV0VwyEQon543RMeR yGB7cYwXibI4CZMNrZwYjGz6HDWDzHIiBZz58LOQ10BNtZc2u4MIX2z1jJ1tnqT5 YzQajuRByku4hUf/yO3wRFwNmLIQqkJJMCjUlQaHOfq8tKOT3hGqJ1mnhdAWaI= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.6 required=5.0 tests=AWL,BAYES_40,FREEMAIL_FROM,FROM_LOCAL_NOVOWEL,HK_RANDOM_ENVFROM,HK_RANDOM_FROM,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=no version=3.3.2 spammy=calgary, Calgary, Alberta, alberta X-HELO: mail-qt0-f181.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=8klwDXQsUe+02U2UqRbr30nOqE34Tp6Mub+52tSDmdE=; b=NLxkQbnrHZk6ctCCsR2nFcJdUkIldVpkKaHQ876Uam7Nn+A9iIU8K+I4PDlieGIblY dQ8V9Gmt/TINy42BS1LDVek+qLf7XSD3YwJ08NcXAHf0lbIwfWjS3UubNbJ2soZPR7pP 3WRbLmN46EDMdAibOEnYsPrq+5PzbsFTABSyxgJe34iE0uoa8tJ337d0VToHUH2BkrvC KD28kA8a8xcZSAQO6odear6VoaUYu7j08+J5eBXwSPKqj5YRa3I++3t9jYr8Datbbd7W K2e4FLyxInDANA+vkPckZ6+GxHswewOFiIQTCFS9xpmO5pfGaW5iRQ8kLIcBuV625jEH MnbA== X-Gm-Message-State: AKGB3mLMkXGNpmLqss9ajMFipr9eHG44xF3iRAFM2ebRK/KtiQtwl9wp 2nTh1fJGBoXnmLPk66k0kzZ2dA9/guOD7eUg+J0TjFjr X-Google-Smtp-Source: ACJfBotPVLLXYljS6ffdkjJ7ui5ozPl6k1QWUhZlsAAhraj3AV2E+bxnYj7D/GVdZUzMLPHIf/K0v7ppCWWknLeSqtk= X-Received: by 10.233.223.198 with SMTP id t189mr6192916qkf.228.1513096868143; Tue, 12 Dec 2017 08:41:08 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com> References: <626a3c06-e9f2-1932-f1f3-47ddb2051215 AT gmail DOT com> <9d3b73ff-f596-51a2-909a-30a767e3e9b3 AT gmail DOT com> From: Doug Henderson Date: Tue, 12 Dec 2017 09:40:47 -0700 Message-ID: Subject: Re: Need help with multibyte UTF-8 characters To: cygwin Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes On 11 December 2017 at 16:36, Thomas Taylor wrote: > Thank you for your advice on setting my locale to en_US.UTF-8. > Unfortunately, Cygwin still seems to have trouble displaying some three-byte > UTF-8 encoded characters correctly. For example, see the following snippet > from a "sed" file. This file attempts to convert XML-encoded filenames to > UTF-8. As you can see, it converts one- and two-byte encodings correctly, > but fails on some three-byte encodings (the en dash, the em dash, and the > ellipsis, all of which are displayed as a filled-in rectangle): > Your sed script works for me. I copy/pasted your sample script into "cvt_script.sed" and also into "cvt_input.txt". My sed command looks like: "sed --file=cvt_script.sed < cvt_input.txt > cvt_output.txt". It correctly translates all the encoded utf-8 strings. Your display may appear different if you are using different fonts in mintty or the windows console. I am using Lucinda Console, 10pt and Consolas 16, respectively. They display different glyphs for the non-breaking space, but are otherwise identical. In mintty, I have LANG and all the LC_* variables set to en_CA.UTF-8, and in the windows console, to en_US.UTF-8. I am running Win 10 and cygwin setup was last updated a couple or three days ago. Check the output of the "locale" command. All variables should have the same value. Is your cygwin installation up to date, or fairly close to current? What wiindows version are you using? HTH, Doug -- Doug Henderson, Calgary, Alberta, Canada - from gmail.com -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple