delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
X-Original-To: | cygwin AT cygwin DOT com |
Delivered-To: | cygwin AT cygwin DOT com |
DMARC-Filter: | OpenDMARC Filter v1.3.2 sourceware.org 7D5BB3858023 |
Authentication-Results: | sourceware.org; |
dmarc=none (p=none dis=none) header.from=tlinx.org | |
Authentication-Results: | sourceware.org; spf=pass smtp.mailfrom=cygwin AT tlinx DOT org |
Message-ID: | <606A2017.2040405@tlinx.org> |
Date: | Sun, 04 Apr 2021 13:22:47 -0700 |
From: | L A Walsh <cygwin AT tlinx DOT org> |
User-Agent: | Thunderbird 2.0.0.24 (Windows/20100228) |
MIME-Version: | 1.0 |
To: | Mark Aitchison <M DOT Aitchison AT cyberXpress DOT co DOT nz> |
Subject: | Re: Perl Unidecode modules - which to use (if not Text::Unidecode)? |
References: | <d3342ff4-f717-f882-5c41-b27ab272dc03 AT cyberXpress DOT co DOT nz> |
In-Reply-To: | <d3342ff4-f717-f882-5c41-b27ab272dc03@cyberXpress.co.nz> |
X-Spam-Status: | No, score=-0.5 required=5.0 tests=BAYES_00, BODY_8BITS, |
KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TRACKER_ID, | |
TXREP autolearn=no autolearn_force=no version=3.4.2 | |
X-Spam-Checker-Version: | SpamAssassin 3.4.2 (2018-09-13) on |
server2.sourceware.org | |
X-BeenThere: | cygwin AT cygwin DOT com |
X-Mailman-Version: | 2.1.29 |
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
Cc: | cygwin AT cygwin DOT com |
Sender: | "Cygwin" <cygwin-bounces AT cygwin DOT com> |
X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 134KN7GQ025719 |
On 2021/04/01 13:35, Mark Aitchison wrote: > 1. What perl Unicode modules should I consider, if not Text::Unidecode? The present need > is to be able to convert those few "foreign" characters (like ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ) > that are basically ASCII with accent marks to their closest ASCII equivalents, --- Hmm...have you tried installing from cpan? I just tried it and it seems to work. > cpan -i Text::Unidecode; > > cat /tmp/in ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ > cat /tmp/in| perl -e ' use Text::Unidecode; while (<>) { print unidecode($_); }' CCCCcccGGGGggggEIIIIOOOO --- I.e. it stripped off all the accent marks. Is that what you want? (it spewed some warnings, but seemed to test out ok, so tried it). put your characters in a file "/tmp/in", (i.e. > cat /tmp/in -- I know, not very creative, but then: cat /tmp/in| tperl use Text::Unidecode; while (<>) { print unidecode($_); }' CCCCcccGGGGggggEIIIIOOOO) Where are you seeing those characters and how do you know they are not already in unicode? I.e. That I'm seeing characters "CcGgEIO" but with accents -- indicates they area already in Unicode. What are you wanting to do.. just convert them to the ASCII characters with the accent marks stripped off? > but I'd > like to do more with Unicode in the future, without going down any dead-ends as far as > being able to run under cygwin is concerned. > > 2. I see some talk of Internationalization in Chapter 2 of "Setting up Cygwin", but > cannot see anything relating to perl modules, and I don't see any easy way to search many > months of the mailing list for a keyword... is there any information I should know about? > > > Thanks, > > Mark Aitchison > > -- > Problem reports: https://cygwin.com/problems.html > FAQ: https://cygwin.com/faq/ > Documentation: https://cygwin.com/docs.html > Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple > > -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright 2019 by DJ Delorie | Updated Jul 2019 |