X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Wed, 11 Jun 2008 10:10:51 -0700 From: Gary Johnson To: cygwin AT cygwin DOT com Subject: Re: Extra spaces in text files in cygwin Message-ID: <20080611171051.GA19787@suncomp1.spk.agilent.com> Mail-Followup-To: cygwin AT cygwin DOT com References: <17764646 DOT post AT talk DOT nabble DOT com> <484EFB14 DOT 65C9E56F AT dessent DOT net> <17766865 DOT post AT talk DOT nabble DOT com> <20080610233030 DOT GB18434 AT suncomp1 DOT spk DOT agilent DOT com> <17767635 DOT post AT talk DOT nabble DOT com> <20080611011758 DOT GD18434 AT suncomp1 DOT spk DOT agilent DOT com> <17780262 DOT post AT talk DOT nabble DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17780262.post@talk.nabble.com> X-Operating-System: SunOS suncomp1 5.8 sparc User-Agent: Mutt/1.5.17 (2007-11-01) X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 2008-06-11, gmarsha11 wrote: > Gary Johnson wrote: > > > > On 2008-06-10, gmarsha11 wrote: > >> > >> Does this mean it's necessary to change the encoding for any files I > >> might > >> need to cat, grep awk, etc.? > > > > I'm no expert on any of this, but as far as I know, all traditional > > Unix tools that deal with strings consider a string to be a sequence > > of 8-bit characters. So the simple answer is yes. The more > > complete answer is that it depends on what you're using those files > > for and what other programs need to read and/or write those files. > > > > The files are being created by HP Data Protector (backup management > software). After I changed the file, I realized that the next time DP > modifies it, it will change the encoding. DP can read the file when it is > ANSI encoded, but will always write in Unicode -- unless I can find out how > to change the encoding it uses. Bummer. If you don't need to use grep, etc., with these files very often, you might just prefix those Unix commands with iconv, as someone else suggested, e.g., iconv -f csunicode abc.txt | grep abc Note that files that I save in Unicode from Notepad do not have an EOL sequence after the last line. If HP Data Protector does the same, that might cause a problem with some tools. HTH, Gary -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/