delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/12/28/02:12:37

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <4B383DCD.80907@monai.ca>
References: <4B3828E1 DOT 4090004 AT rosi-kessel DOT org> <4B382C69 DOT 20706 AT rosi-kessel DOT org> <4B383DCD DOT 80907 AT monai DOT ca>
Date: Mon, 28 Dec 2009 07:12:25 +0000
Message-ID: <416096c60912272312p59392560j297422825c5720a3@mail.gmail.com>
Subject: Re: rsync no longer preserves extended ASCII characters after 1.7 upgrade
From: Andy Koppe <andy DOT koppe AT gmail DOT com>
To: cygwin AT cygwin DOT com
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

> On 2009/12/27 7:56 PM, Adam Rosi-Kessel wrote:
>> But when I
>> view them from the linux box, they have scrambled accents -- either just
>> ?'s if I use ls (must be a terminal issue)

You probably haven't got a charset configured on the Linux box. In
that case, ASCII is assumed, and 'ls' prints anything outside that
charset as a '?'. You can force 'ls' to print all characters anyway
using the '--show-control-chars' option.

>> or different nonstandard
>> characters that aren't the right extended characters if I redirect the
>> output to a file and then view that.

That means you've got a character set mismatch between Cygwin (UTF-8)
and Linux (presumably ISO-8859-1). Please note that the ext3
filesystem and Unix filesystems in general have no concept of
character sets: filenames are just bytes. The interpretation of those
bytes is entirely up to applications, and you tell them what character
set to assume using the LANG or LC_CTYPE variables. So one way to fix
your mismatch is to specify e.g. LANG=en_US.UTF-8 on the Linux system.

>> I'm just trying to get back the behavior from before the upgrade. Thanks
>> for any suggestions.
>
> Assuming you upgraded from 1.5.x to 1.7.1, Cygwin has new default
> locale/charset settings that affect filename handling. Have a look at
> the Cygwin User Guide, specifically the page on Internationalization, here:
>
> http://cygwin.com/cygwin-ug-net/setup-locale.html
>
> I'm not sure what the default locale/charset was for 1.5.x, but for
> 1.7.1, it is "C.UTF-8".

1.5's default charset was the Windows default "ANSI" codepage (as
returned by the GetACP() function). On English systems, that's
codepage 1252, which is mostly identical with ISO-8859-1, except for
additional printable characters in the 0x80..0x9F range.

> You may be able to get the old behaviour back by
> setting LANG (or LC_ALL or LC_CTYPE) in Cygwin to match the
> locale/charset of your Linux system.

Yes, that's one way. Specifying e.g. 'LC_CTYPE=en_US rsync ...' (i.e.
a language without an explicit character set) will give you the ANSI
codepage.

But I think the --iconv option is the better way. Assuming you want to
stick with ISO-8859-1 on the Linux side, '--iconv utf8,iso88591'
should do the job.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019