X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:references:from:to:message-id:date :mime-version:in-reply-to:content-type; q=dns; s=default; b=nB4k +7lUSgBBxreB3LPc6lV5h6igBvBtJRQ5E/z67a0LKoIc7zfhvrSscNPGKlCvoU1y HIyfmA5ukHk8ttvIY893zijrlKszes7JlpnRikbrfTMzdjeZm/Ys5H4dLpXVUH1u o4PGmSiBbcg9g7SV7VlprCmyxV1iefuO+2W6Y68= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:references:from:to:message-id:date :mime-version:in-reply-to:content-type; s=default; bh=5+R2XQcfVy L52QSBKSEX/pYvjUU=; b=hIIiyFfesx8z6344aGWTXaqb77qdf0sWqj0K/h7Ych 3ObUHXFD2W6sKR6eBM7xyJ85wvRlsN4aQtGSa/dHELInfVCYxUpSGk6LrnyQfhvN Q7LSppJFrjBcwkuzu2HIeMDiAEHycJSblsNAtv0b/XRqAYd3eY+G41jkc3/WxYfW E= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.8 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy=Euro, Sharp, percent, dash X-HELO: mail-qt0-f177.google.com X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:references:from:to:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=UISQ/SEEU0jmJ+I7YTNQzOu3IqqRlBdMfcY93vO/+yA=; b=G//E1hq/kYYI4rXMAXu3lGj4gw61BbzjXioJ7w5UHmTdR8LCb3/j0OZ/6KtIS2Dtgd faEBVQF5gYcToWN8ArLNR36Dw2cPcY2/DLF3nhTY7ReJyEIcJ9PEyPBydjNrawk/m2U+ V5j04UIfW+EbEmoPxOXIyH5+H78zUsY+11yN4OJwjDXmkNYmn7IzZEhJQ4vRG64ZcnTA XIpaDS1f1/Noz5ESGLgZWXx7y8aBrnWlpor4Kfae1wOLxxaQvBQ9xmMuu3+i6OcG2jV6 RYq3uzEht71gKQ9gwSB7fQCQC7XFFMD0ILvdzKCl4ahIRAyoufY7Megtr/XLVPngiugx 1m/Q== X-Gm-Message-State: AKGB3mIGzFeTJLAO2QdELYtqP8YQVcU6DGmNbC+uleHUc3iKlffxey7r 4n8S/ltSCgUJB1Ow5Olpbq4= X-Google-Smtp-Source: ACJfBouWk1KSS3Rf0jvN/a844/KvrQjs0mCvqG1uglekakUcb1qCiLNaOskrr+JB/I/2LbbksdKqEA== X-Received: by 10.200.41.145 with SMTP id 17mr7266137qts.239.1513107758017; Tue, 12 Dec 2017 11:42:38 -0800 (PST) Subject: Re: Need help with multibyte UTF-8 characters References: <626a3c06-e9f2-1932-f1f3-47ddb2051215 AT gmail DOT com> <9d3b73ff-f596-51a2-909a-30a767e3e9b3 AT gmail DOT com> From: Thomas Taylor To: cygwin AT cygwin DOT com Message-ID: <1909177a-3f35-52d5-1717-9007d6efaa71@gmail.com> Date: Tue, 12 Dec 2017 14:42:38 -0500 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com> Content-Type: multipart/mixed; boundary="------------F45A94644063078C0C6E8549" --------------F45A94644063078C0C6E8549 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit I believe that Cygwin displays certain UTF-8 characters incorrectly.  To see the problem, first save the attached "utf-8_test.sed" text file to your desktop.  Then run "mintty," and set its options by right clicking in its title bar, selecting "Options" and then "Text."  On the Text page set "Locale" to "en_US" and "Character set" to "UTF-8," and then "Save."  Now exit and restart mintty.  Change directory to your desktop and run the editor "vim" on the utf-8_test.sed file.  Once inside vim do a ":set fileencoding=utf-8".  You should now see that vim displays correctly a sample of one-, two-, and three-byte UTF-8 character encodings in the test file.  Vim fails, however, on the three-byte encodings for the "en" dash, the "em" dash, and the ellipsis, each of which displays incorrectly as a filled-in rectangle.  Now exit vim and do a "less" or "cat" on the utf-8_test.sed file.  You should see most of the sample UTF-8 encoded characters displayed correctly, except once again for the en dash, em dash, and ellipsis.  So it looks like a problem in the underlying Cygwin run-time libraries rather than in vim, less, or cat.  I haven't tested this on four-byte UTF-8 character encodings, but assume Cygwin will have similar problems. --------------F45A94644063078C0C6E8549 Content-Type: text/plain; charset=UTF-8; name="utf-8_test.sed" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="utf-8_test.sed" IyBUaGlzIGlzIGZpbGUgInV0Zi04X3Rlc3Quc2VkIgojCiMgSXQncyB1c2Vk IGJ5IHRoZSAic2VkIiB1dGlsaXR5IHByb2dyYW0KIyB0byBjb252ZXJ0IFhN TC1lbmNvZGVkIGZpbGVuYW1lcyB0byBVVEYtOAoKIyBNYXRjaCBsb25nZXN0 IHN0cmluZ3MgZmlyc3QKCiMgVGhyZWUtYnl0ZSBlbmNvZGluZ3M6CgojIEVu IGRhc2gKcy8lW0VlXTIlODAlOTMv4oCTL2cKCiMgRW0gZGFzaApzLyVbRWVd MiU4MCU5NC/igJQvZwoKIyBIb3Jpem9udGFsIGVsbGlwc2lzCnMvJVtFZV0y JTgwJVtBYV02L+KApi9nCgojIExlc3MtdGhhbi1vci1lcXVhbCBzaWduCnMv JVtFZV0yJTg5JVtBYV00L+KJpC9nCgojIEV1cm8gc3ltYm9sCnMvJVtFZV0y JTgyJVtBYV1bQ2NdL+KCrC9nCgojIFR3by1ieXRlIGVuY29kaW5nczoKCiMg Tm9uLWJyZWFrIHNwYWNlCnMvJVtDY10yJVtBYV0wL+KOtS9nCgojIExvd2Vy Y2FzZSBhIHdpdGggYWN1dGUgYWNjZW50CnMvJVtDY10zJVtBYV0xL8OhL2cK CiMgTG93ZXJjYXNlIGEgd2l0aCB1bWxhdXQgKGEuay5hLiBkaWFlcmVzaXMp CnMvJVtDY10zJVtBYV00L8OkL2cKCiMgTG93ZXJjYXNlIGUgd2l0aCBhY3V0 ZSBhY2NlbnQKcy8lW0NjXTMlW0FhXTkvw6kvZwoKIyBMb3dlcmNhc2UgaSB3 aXRoIGFjdXRlIGFjY2VudApzLyVbQ2NdMyVbQWFdRC/DrS9nCgojIExvd2Vy Y2FzZSBvIHdpdGggYWN1dGUgYWNjZW50CnMvJVtDY10zJVtCYl0zL8OzL2cK CiMgTG93ZXJjYXNlIG4gd2l0aCB0aWxkZQpzLyVbQ2NdMyVbQmJdMS/DsS9n CgojIExvd2VyY2FzZSBjIHdpdGggYWN1dGUgYWNjZW50IApzLyVbQ2NdNCU4 Ny/Ehy9nCgojIExvd2VyY2FzZSBvIHdpdGggbG9uZyBhY2NlbnQgKGEuay5h LiBtYWNyb24pCnMvJVtDY101JThbRGRdL8WNL2cKCiMgT25lLWJ5dGUgZW5j b2RpbmdzOgoKIyAiQW5kIiBzaWduIChhLmsuYS4gYW1wZXJzYW5kKQpzLyYj Mzg7L1wmL2cKCiMgU3BhY2UKcy8lMjAvIC9nCgojIFNoYXJwIChvciBwb3Vu ZCkgc2lnbgpzLyUyMy8jL2cKCiMgUGVyY2VudCBzaWduCnMvJTI1LyUvZwoK IyBMZWZ0IHNxdWFyZSBicmFja2V0CnMvJTVbQmJdL1svZwoKIyBSaWdodCBz cXVhcmUgYnJhY2tldApzLyU1W0RkXS9dL2cKCiMgRW5kIG9mIGZpbGUgInV0 Zi04X3Rlc3Quc2VkIgoK --------------F45A94644063078C0C6E8549 Content-Type: text/plain; charset=us-ascii -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple --------------F45A94644063078C0C6E8549--