delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2008/06/10/21:18:27

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Tue, 10 Jun 2008 18:17:58 -0700
From: Gary Johnson <garyjohn AT spk DOT agilent DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: Extra spaces in text files in cygwin
Message-ID: <20080611011758.GD18434@suncomp1.spk.agilent.com>
Mail-Followup-To: cygwin AT cygwin DOT com
References: <17764646 DOT post AT talk DOT nabble DOT com> <484EFB14 DOT 65C9E56F AT dessent DOT net> <17766865 DOT post AT talk DOT nabble DOT com> <20080610233030 DOT GB18434 AT suncomp1 DOT spk DOT agilent DOT com> <17767635 DOT post AT talk DOT nabble DOT com>
MIME-Version: 1.0
In-Reply-To: <17767635.post@talk.nabble.com>
X-Operating-System: SunOS suncomp1 5.8 sparc
User-Agent: Mutt/1.5.17 (2007-11-01)
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id m5B1IPEE028645

On 2008-06-10, gmarsha11 wrote:
> Ok,  have saved the file with Windows notepad as ANSI, Unicode, Unicode big
> endian, and UTF-8.
> 
> Both Unicode options give me the output with the extra spaces.  ANSI and
> UTF-8 allow me to see the files as I would expect to see them.
> 
> Does this mean it's necessary to change the encoding for any files I might
> need to cat, grep awk, etc.?

I'm no expert on any of this, but as far as I know, all traditional 
Unix tools that deal with strings consider a string to be a sequence 
of 8-bit characters.  So the simple answer is yes.  The more 
complete answer is that it depends on what you're using those files 
for and what other programs need to read and/or write those files.

FWIW, I used Notepad on my Windows XP system to create a file 
containing your string, "This is abc file".  When I went to save it, 
the Encoding was already set to ANSI.  In other words, you shouldn't 
have to do anything special to save your files in a format already 
compatible with grep, etc.

That being said, you really shouldn't use Notepad to edit any files 
you expect to use with Cygwin, because Cygwin tools expect lines to 
end with LF, not a CR-LF pair.  Many tools will consider that CR to 
be part of the line.  In particular, bash will give odd results if 
you ask it to execute a shell script written with Notepad.

I got different results than you did when I cat'd abc.txt.  When I 
saved it as Unicode, the output of cat was:

   ÿþThis is abc file

When I saved it as Unicode Big Endian, the output of cat was:

   þÿThis is abc file

The only difference between the two was the ordering of the bytes in 
the BOM (Byte Order Mark) at the beginning of each file.  In both 
cases, there were no extra spaces.  I was running bash in an rxvt 
window, if that matters.

Regards,
Gary


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019