delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/03/30/07:36:36

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-2.4 required=5.0 tests=AWL,BAYES_00,SPF_PASS
X-Spam-Check-By: sourceware.org
Message-ID: <49D0BF21.9000304@gmail.com>
Date: Mon, 30 Mar 2009 13:46:25 +0100
From: Dave Korn <dave DOT korn DOT cygwin AT googlemail DOT com>
User-Agent: Thunderbird 2.0.0.17 (Windows/20080914)
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: sed converts 8-bit input text to 16-bit (Unicode-16?) characters - how to suppress that?
References: <B7436E984B004D5EB2413E4174482335 AT SEMENTINA> <20090330121043 DOT GT12738 AT calimero DOT vinschen DOT de>
In-Reply-To: <20090330121043.GT12738@calimero.vinschen.de>
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

Corinna Vinschen wrote:
> On Mar 30 13:48, Michael Moser wrote:
>> I need to mangle a file containing "8-bit ASCII" characters (i.e. the
>> file contains also characters in the upper 8-bit range, namely a few
>> umlauts as well as some french accented characters). 
>>
>> Strange enough, the SED version that came as part of cygwin emits the
>> result of the mangling using 16-bit characters (I believe those are
>> Unicode-16 characters, but not sure. The Hexeditor shows each second
>> byte as always 00, execpt for the first two bytes which read FF FE).
> 
> This is very likely not Cygwin's sed.  Do you have another sed in $PATH
> by any chance?  I tried with input files containing german umlauts and
> sed does not convert to wide char and it does not produce a BOM marker
> at the start of the file.

  Another possibility is that wordpad or notepad has tried to be clever and
gone and unexpectedly saved the original source file in UTF16.  Did you verify
the original source file in a hexeditor too, Michael?

    cheers,
      DaveK

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019