delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2017/12/14/13:09:27

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:reply-to:subject:to:references:from:message-id
:date:mime-version:in-reply-to:content-type
:content-transfer-encoding; q=dns; s=default; b=heDQ8kq/0xjQEhJZ
eQOxhz7VduHyeEFm6FHkiHQJ8VP1mOXjH+v6j59Pp7xozwqcwpHLtJbARXbA8vBn
gYGN94TUJ8WJ1b+shioNJXuch4fnvIfNKEr2rh4M0cENlkmdk3uZq2LTsH8c0d3m
fqxVJ+Nc0ea7HmsH6TerrwY8oTQ=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:reply-to:subject:to:references:from:message-id
:date:mime-version:in-reply-to:content-type
:content-transfer-encoding; s=default; bh=6PMsGyNjl8Ej70uBNA8ndN
YCc+A=; b=cNhoNTQxNNt/Wifo+rDvvjyKfxs+OYIps1KCPC2rkLvMqavd7QaoyO
t1HMKuBZL/jMf0DTMwa4nZT2LUq0LG5d7ndUwmTFhdpyCb4Y/S+BcHgahIg9elfY
3V36i3hrRYgWSqIdpHAGRvvuQNJCL3INcV6AgG1uNfdMWU9UxhCrs=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=2.4 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=enc, bomb, calgary, Calgary
X-HELO: smtp-out-so.shaw.ca
X-Authority-Analysis: v=2.2 cv=J/va1EvS c=1 sm=1 tr=0 a=MVEHjbUiAHxQW0jfcDq5EA==:117 a=MVEHjbUiAHxQW0jfcDq5EA==:17 a=N659UExz7-8A:10 a=w5aJ8kaLLAry8Qfnm_kA:9 a=pILNOxqGKmIA:10
Reply-To: Brian DOT Inglis AT SystematicSw DOT ab DOT ca
Subject: Re: Need help with multibyte UTF-8 characters
To: cygwin AT cygwin DOT com
References: <626a3c06-e9f2-1932-f1f3-47ddb2051215 AT gmail DOT com> <9d3b73ff-f596-51a2-909a-30a767e3e9b3 AT gmail DOT com>
From: Brian Inglis <Brian DOT Inglis AT SystematicSw DOT ab DOT ca>
Message-ID: <4f67d273-61f1-29d0-433a-d519e70bf912@SystematicSw.ab.ca>
Date: Thu, 14 Dec 2017 11:09:06 -0700
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0
MIME-Version: 1.0
In-Reply-To: <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com>
X-CMAE-Envelope: MS4wfMQhPYDd1a3J/eZnNL3va8gHg9BAAbzhZdJItEuz7we/sv0hUZ7e2R21uzjwqr1sL3rZ4fS+wjPXesQYbDCesD58K1+043gjCiqvpopGkQSLpKAVGWCS sF9ihWs1N1iqDzejqAjVN2tOO5xmSu94++YPJ0gmPLYwcpgjH1Q/dqeg4JYcmEm0Igqdaz+NfMiZGQ==
X-IsSubscribed: yes

On 2017-12-11 16:36, Thomas Taylor wrote:
> Thank you for your advice on setting my locale to en_US.UTF-8.  Unfortunately,
> Cygwin still seems to have trouble displaying some three-byte UTF-8 encoded
> characters correctly.  For example, see the following snippet from a "sed"
> file.  This file attempts to convert XML-encoded filenames to UTF-8.  As you can
> see, it converts one- and two-byte encodings correctly, but fails on some
> three-byte encodings (the en dash, the em dash, and the ellipsis, all of which
> are displayed as a filled-in rectangle):

Going back to first principles - what is your script encoded as and run as?
What characters are in your script?
	$ wc -lwmc ...
What does vim say for that script:
	:set enc? tenc? fenc? fencs? eol? bomb?
What does locale say sed runs as:
	$ locale

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019