delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2008/06/11/13:11:27

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Wed, 11 Jun 2008 10:10:51 -0700
From: Gary Johnson <garyjohn AT spk DOT agilent DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Extra spaces in text files in cygwin
Message-ID: <20080611171051.GA19787@suncomp1.spk.agilent.com>
Mail-Followup-To: cygwin AT cygwin DOT com
References: <17764646 DOT post AT talk DOT nabble DOT com> <484EFB14 DOT 65C9E56F AT dessent DOT net> <17766865 DOT post AT talk DOT nabble DOT com> <20080610233030 DOT GB18434 AT suncomp1 DOT spk DOT agilent DOT com> <17767635 DOT post AT talk DOT nabble DOT com> <20080611011758 DOT GD18434 AT suncomp1 DOT spk DOT agilent DOT com> <17780262 DOT post AT talk DOT nabble DOT com>
MIME-Version: 1.0
In-Reply-To: <17780262.post@talk.nabble.com>
X-Operating-System: SunOS suncomp1 5.8 sparc
User-Agent: Mutt/1.5.17 (2007-11-01)
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On 2008-06-11, gmarsha11 wrote:
> Gary Johnson wrote:
> > 
> > On 2008-06-10, gmarsha11 wrote:
> >> 
> >> Does this mean it's necessary to change the encoding for any files I
> >> might
> >> need to cat, grep awk, etc.?
> > 
> > I'm no expert on any of this, but as far as I know, all traditional 
> > Unix tools that deal with strings consider a string to be a sequence 
> > of 8-bit characters.  So the simple answer is yes.  The more 
> > complete answer is that it depends on what you're using those files 
> > for and what other programs need to read and/or write those files.
> > 
> 
> The files are being created by HP Data Protector (backup management
> software).  After I changed the file, I realized that the next time DP
> modifies it, it will change the encoding.  DP can read the file when it is
> ANSI encoded, but will always write in Unicode -- unless I can find out how
> to change the encoding it uses.

Bummer.  If you don't need to use grep, etc., with these files very 
often, you might just prefix those Unix commands with iconv, as 
someone else suggested, e.g.,

   iconv -f csunicode abc.txt | grep abc

Note that files that I save in Unicode from Notepad do not have an 
EOL sequence after the last line.  If HP Data Protector does the 
same, that might cause a problem with some tools.

HTH,
Gary


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019