X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Tue, 10 Jun 2008 18:17:58 -0700 From: Gary Johnson To: cygwin AT cygwin DOT com Subject: Re: Extra spaces in text files in cygwin Message-ID: <20080611011758.GD18434@suncomp1.spk.agilent.com> Mail-Followup-To: cygwin AT cygwin DOT com References: <17764646 DOT post AT talk DOT nabble DOT com> <484EFB14 DOT 65C9E56F AT dessent DOT net> <17766865 DOT post AT talk DOT nabble DOT com> <20080610233030 DOT GB18434 AT suncomp1 DOT spk DOT agilent DOT com> <17767635 DOT post AT talk DOT nabble DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <17767635.post@talk.nabble.com> X-Operating-System: SunOS suncomp1 5.8 sparc User-Agent: Mutt/1.5.17 (2007-11-01) X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id m5B1IPEE028645 On 2008-06-10, gmarsha11 wrote: > Ok, have saved the file with Windows notepad as ANSI, Unicode, Unicode big > endian, and UTF-8. > > Both Unicode options give me the output with the extra spaces. ANSI and > UTF-8 allow me to see the files as I would expect to see them. > > Does this mean it's necessary to change the encoding for any files I might > need to cat, grep awk, etc.? I'm no expert on any of this, but as far as I know, all traditional Unix tools that deal with strings consider a string to be a sequence of 8-bit characters. So the simple answer is yes. The more complete answer is that it depends on what you're using those files for and what other programs need to read and/or write those files. FWIW, I used Notepad on my Windows XP system to create a file containing your string, "This is abc file". When I went to save it, the Encoding was already set to ANSI. In other words, you shouldn't have to do anything special to save your files in a format already compatible with grep, etc. That being said, you really shouldn't use Notepad to edit any files you expect to use with Cygwin, because Cygwin tools expect lines to end with LF, not a CR-LF pair. Many tools will consider that CR to be part of the line. In particular, bash will give odd results if you ask it to execute a shell script written with Notepad. I got different results than you did when I cat'd abc.txt. When I saved it as Unicode, the output of cat was: ÿþThis is abc file When I saved it as Unicode Big Endian, the output of cat was: þÿThis is abc file The only difference between the two was the ordering of the bytes in the BOM (Byte Order Mark) at the beginning of each file. In both cases, there were no extra spaces. I was running bash in an rxvt window, if that matters. Regards, Gary -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/