delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2021/04/04/16:23:09

X-Recipient: archive-cygwin AT delorie DOT com
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 7D5BB3858023
Authentication-Results: sourceware.org;
dmarc=none (p=none dis=none) header.from=tlinx.org
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=cygwin AT tlinx DOT org
Message-ID: <606A2017.2040405@tlinx.org>
Date: Sun, 04 Apr 2021 13:22:47 -0700
From: L A Walsh <cygwin AT tlinx DOT org>
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: Mark Aitchison <M DOT Aitchison AT cyberXpress DOT co DOT nz>
Subject: Re: Perl Unidecode modules - which to use (if not Text::Unidecode)?
References: <d3342ff4-f717-f882-5c41-b27ab272dc03 AT cyberXpress DOT co DOT nz>
In-Reply-To: <d3342ff4-f717-f882-5c41-b27ab272dc03@cyberXpress.co.nz>
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_00, BODY_8BITS,
KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TRACKER_ID,
TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Cc: cygwin AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 134KN7GQ025719

On 2021/04/01 13:35, Mark Aitchison wrote:
> 1. What perl Unicode modules should I consider, if not Text::Unidecode? The present need 
> is to be able to convert those few "foreign" characters (like ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ) 
> that are basically ASCII with accent marks to their closest ASCII equivalents, 
---
    Hmm...have you tried installing from cpan?

I just tried it and it seems to work.

>  cpan -i Text::Unidecode;
>  > cat /tmp/in

ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ

>  cat /tmp/in| perl -e '
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'

CCCCcccGGGGggggEIIIIOOOO

---
I.e. it stripped off all the accent marks.  Is that what you
want?


 

    (it spewed some warnings, but seemed to test out ok, so tried it).
put your characters in a file "/tmp/in", (i.e.
>  cat /tmp/in
 -- I know, not very creative,
but then:
 cat /tmp/in| tperl
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'

CCCCcccGGGGggggEIIIIOOOO)

    Where are you seeing those characters and how do you know they are not
already in unicode?  I.e. That I'm seeing characters "CcGgEIO" but with
accents -- indicates they area already in Unicode.

What are you wanting to do.. just convert them to the ASCII characters
with the accent marks stripped off?


> but I'd 
> like to do more with Unicode in the future, without going down any dead-ends as far as 
> being able to run under cygwin is concerned.
>
> 2. I see some talk of Internationalization in Chapter 2 of "Setting up Cygwin", but 
> cannot see anything relating to perl modules, and I don't see any easy way to search many 
> months of the mailing list for a keyword... is there any information I should know about?
>
>
> Thanks,
>
> Mark Aitchison
>
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>
>   

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019