Mail Archives: cygwin/2021/04/04/16:23:09

L A Walsh <cygwin AT tlinx DOT org> · Sun, 04 Apr 2021 13:22:47 -0700

On 2021/04/01 13:35, Mark Aitchison wrote:
> 1. What perl Unicode modules should I consider, if not Text::Unidecode? The present need 
> is to be able to convert those few "foreign" characters (like ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ) 
> that are basically ASCII with accent marks to their closest ASCII equivalents, 
---
    Hmm...have you tried installing from cpan?

I just tried it and it seems to work.

>  cpan -i Text::Unidecode;
>  > cat /tmp/in

ÇĆĈĊçĉċĜĞĠĢĝģğġËÌÍÎÏÒÓÔÕ

>  cat /tmp/in| perl -e '
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'

CCCCcccGGGGggggEIIIIOOOO

---
I.e. it stripped off all the accent marks.  Is that what you
want?

    (it spewed some warnings, but seemed to test out ok, so tried it).
put your characters in a file "/tmp/in", (i.e.
>  cat /tmp/in
 -- I know, not very creative,
but then:
 cat /tmp/in| tperl
use Text::Unidecode;
while (<>) {
print unidecode($_);
}'

CCCCcccGGGGggggEIIIIOOOO)

    Where are you seeing those characters and how do you know they are not
already in unicode?  I.e. That I'm seeing characters "CcGgEIO" but with
accents -- indicates they area already in Unicode.

What are you wanting to do.. just convert them to the ASCII characters
with the accent marks stripped off?

> but I'd 
> like to do more with Unicode in the future, without going down any dead-ends as far as 
> being able to run under cygwin is concerned.
>
> 2. I see some talk of Internationalization in Chapter 2 of "Setting up Cygwin", but 
> cannot see anything relating to perl modules, and I don't see any easy way to search many 
> months of the mailing list for a keyword... is there any information I should know about?
>
>
> Thanks,
>
> Mark Aitchison
>
> --
> Problem reports:      https://cygwin.com/problems.html
> FAQ:                  https://cygwin.com/faq/
> Documentation:        https://cygwin.com/docs.html
> Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple
>
>   

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

X-Recipient:	archive-cygwin AT delorie DOT com
X-Original-To:	cygwin AT cygwin DOT com
Delivered-To:	cygwin AT cygwin DOT com
DMARC-Filter:	OpenDMARC Filter v1.3.2 sourceware.org 7D5BB3858023
Authentication-Results:	sourceware.org;
	dmarc=none (p=none dis=none) header.from=tlinx.org
Authentication-Results:	sourceware.org; spf=pass smtp.mailfrom=cygwin AT tlinx DOT org
Message-ID:	<606A2017.2040405@tlinx.org>
Date:	Sun, 04 Apr 2021 13:22:47 -0700
From:	L A Walsh <cygwin AT tlinx DOT org>
User-Agent:	Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version:	1.0
To:	Mark Aitchison <M DOT Aitchison AT cyberXpress DOT co DOT nz>
Subject:	Re: Perl Unidecode modules - which to use (if not Text::Unidecode)?
References:	<d3342ff4-f717-f882-5c41-b27ab272dc03 AT cyberXpress DOT co DOT nz>
In-Reply-To:	<d3342ff4-f717-f882-5c41-b27ab272dc03@cyberXpress.co.nz>
X-Spam-Status:	No, score=-0.5 required=5.0 tests=BAYES_00, BODY_8BITS,
	KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TRACKER_ID,
	TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version:	SpamAssassin 3.4.2 (2018-09-13) on
	server2.sourceware.org
X-BeenThere:	cygwin AT cygwin DOT com
X-Mailman-Version:	2.1.29
List-Id:	General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive:	<https://cygwin.com/pipermail/cygwin/>
List-Post:	<mailto:cygwin AT cygwin DOT com>
List-Help:	<mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe:	<https://cygwin.com/mailman/listinfo/cygwin>,
	<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Cc:	cygwin AT cygwin DOT com
Sender:	"Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted:	from base64 to 8bit by delorie.com id 134KN7GQ025719

webmaster	delorie software privacy
Copyright � 2019 by DJ Delorie	Updated Jul 2019