X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_00,SARE_MSGID_LONG40,SPF_PASS X-Spam-Check-By: sourceware.org MIME-Version: 1.0 In-Reply-To: <4B383DCD.80907@monai.ca> References: <4B3828E1 DOT 4090004 AT rosi-kessel DOT org> <4B382C69 DOT 20706 AT rosi-kessel DOT org> <4B383DCD DOT 80907 AT monai DOT ca> Date: Mon, 28 Dec 2009 07:12:25 +0000 Message-ID: <416096c60912272312p59392560j297422825c5720a3@mail.gmail.com> Subject: Re: rsync no longer preserves extended ASCII characters after 1.7 upgrade From: Andy Koppe To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com > On 2009/12/27 7:56 PM, Adam Rosi-Kessel wrote: >> But when I >> view them from the linux box, they have scrambled accents -- either just >> ?'s if I use ls (must be a terminal issue) You probably haven't got a charset configured on the Linux box. In that case, ASCII is assumed, and 'ls' prints anything outside that charset as a '?'. You can force 'ls' to print all characters anyway using the '--show-control-chars' option. >> or different nonstandard >> characters that aren't the right extended characters if I redirect the >> output to a file and then view that. That means you've got a character set mismatch between Cygwin (UTF-8) and Linux (presumably ISO-8859-1). Please note that the ext3 filesystem and Unix filesystems in general have no concept of character sets: filenames are just bytes. The interpretation of those bytes is entirely up to applications, and you tell them what character set to assume using the LANG or LC_CTYPE variables. So one way to fix your mismatch is to specify e.g. LANG=en_US.UTF-8 on the Linux system. >> I'm just trying to get back the behavior from before the upgrade. Thanks >> for any suggestions. > > Assuming you upgraded from 1.5.x to 1.7.1, Cygwin has new default > locale/charset settings that affect filename handling. Have a look at > the Cygwin User Guide, specifically the page on Internationalization, here: > > http://cygwin.com/cygwin-ug-net/setup-locale.html > > I'm not sure what the default locale/charset was for 1.5.x, but for > 1.7.1, it is "C.UTF-8". 1.5's default charset was the Windows default "ANSI" codepage (as returned by the GetACP() function). On English systems, that's codepage 1252, which is mostly identical with ISO-8859-1, except for additional printable characters in the 0x80..0x9F range. > You may be able to get the old behaviour back by > setting LANG (or LC_ALL or LC_CTYPE) in Cygwin to match the > locale/charset of your Linux system. Yes, that's one way. Specifying e.g. 'LC_CTYPE=en_US rsync ...' (i.e. a language without an explicit character set) will give you the ANSI codepage. But I think the --iconv option is the better way. Assuming you want to stick with ISO-8859-1 on the Linux side, '--iconv utf8,iso88591' should do the job. Andy -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple