| delorie.com/archives/browse.cgi | search |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| X-Original-To: | cygwin AT cygwin DOT com |
| Delivered-To: | cygwin AT cygwin DOT com |
| DMARC-Filter: | OpenDMARC Filter v1.3.2 sourceware.org 7D5BB3858023 |
| Authentication-Results: | sourceware.org; |
| dmarc=none (p=none dis=none) header.from=tlinx.org | |
| Authentication-Results: | sourceware.org; spf=pass smtp.mailfrom=cygwin AT tlinx DOT org |
| Message-ID: | <606A2017.2040405@tlinx.org> |
| Date: | Sun, 04 Apr 2021 13:22:47 -0700 |
| From: | L A Walsh <cygwin AT tlinx DOT org> |
| User-Agent: | Thunderbird 2.0.0.24 (Windows/20100228) |
| MIME-Version: | 1.0 |
| To: | Mark Aitchison <M DOT Aitchison AT cyberXpress DOT co DOT nz> |
| Subject: | Re: Perl Unidecode modules - which to use (if not Text::Unidecode)? |
| References: | <d3342ff4-f717-f882-5c41-b27ab272dc03 AT cyberXpress DOT co DOT nz> |
| In-Reply-To: | <d3342ff4-f717-f882-5c41-b27ab272dc03@cyberXpress.co.nz> |
| X-Spam-Status: | No, score=-0.5 required=5.0 tests=BAYES_00, BODY_8BITS, |
| KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TRACKER_ID, | |
| TXREP autolearn=no autolearn_force=no version=3.4.2 | |
| X-Spam-Checker-Version: | SpamAssassin 3.4.2 (2018-09-13) on |
| server2.sourceware.org | |
| X-BeenThere: | cygwin AT cygwin DOT com |
| X-Mailman-Version: | 2.1.29 |
| List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
| List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
| List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
| Cc: | cygwin AT cygwin DOT com |
| Sender: | "Cygwin" <cygwin-bounces AT cygwin DOT com> |
| X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 134KN7GQ025719 |
On 2021/04/01 13:35, Mark Aitchison wrote:
> 1. What perl Unicode modules should I consider, if not Text::Unidecode? The present need
> is to be able to convert those few "foreign" characters (like ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ)
> that are basically ASCII with accent marks to their closest ASCII equivalents,
---
Hmm...have you tried installing from cpan?
I just tried it and it seems to work.
> cpan -i Text::Unidecode;
> > cat /tmp/in
ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ
> cat /tmp/in| perl -e '
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'
CCCCcccGGGGggggEIIIIOOOO
---
I.e. it stripped off all the accent marks. Is that what you
want?
(it spewed some warnings, but seemed to test out ok, so tried it).
put your characters in a file "/tmp/in", (i.e.
> cat /tmp/in
-- I know, not very creative,
but then:
cat /tmp/in| tperl
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'
CCCCcccGGGGggggEIIIIOOOO)
Where are you seeing those characters and how do you know they are not
already in unicode? I.e. That I'm seeing characters "CcGgEIO" but with
accents -- indicates they area already in Unicode.
What are you wanting to do.. just convert them to the ASCII characters
with the accent marks stripped off?
> but I'd
> like to do more with Unicode in the future, without going down any dead-ends as far as
> being able to run under cygwin is concerned.
>
> 2. I see some talk of Internationalization in Chapter 2 of "Setting up Cygwin", but
> cannot see anything relating to perl modules, and I don't see any easy way to search many
> months of the mailing list for a keyword... is there any information I should know about?
>
>
> Thanks,
>
> Mark Aitchison
>
> --
> Problem reports: https://cygwin.com/problems.html
> FAQ: https://cygwin.com/faq/
> Documentation: https://cygwin.com/docs.html
> Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
>
>
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
| webmaster | delorie software privacy |
| Copyright 2019 by DJ Delorie | Updated Jul 2019 |