X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org To: cygwin AT cygwin DOT com From: =?ISO-8859-1?Q?Ren=E9_Berber?= Subject: Re: Extra spaces in text files in cygwin Date: Tue, 10 Jun 2008 20:52:55 -0500 Lines: 53 Message-ID: References: <17764646 DOT post AT talk DOT nabble DOT com> <484EFB14 DOT 65C9E56F AT dessent DOT net> <17766865 DOT post AT talk DOT nabble DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) In-Reply-To: <17766865.post@talk.nabble.com> X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com gmarsha11 wrote: > I'm not sure about the file's encoding. How do I tell? If you have "file" installed, its easy: $ file Document.txt Document.txt: Unicode text, UTF-16, little-endian > When I create a new file with vi, I can read the file with no problem. T= he > output is normal. Look at the bottom line, vi tells you what kind of "text" it is... sort of: "Document.txt" [converted][dos] 1L, 20C The "converted" means it wasn't regular text, the "dos" means it has=20 CR-LF line endings. If you like to look at what it really is, try: $ od -tx2z Document.txt 0000000 feff 0054 0068 0069 0073 0020 0069 0073 >..T.h.i.s. .i.s.< 0000020 0020 0061 0062 0063 0020 0066 0069 006c > .a.b.c. .f.i.l.< 0000040 0065 000d 000a >e.....< 0000046 So your spaces are really null bytes (some fonts put little smileys), vi=20 was wrong no CR in there. > These particular text files that I am working with were created by HP Data > Protector. I can easily parse and manipulate these files on HPUX servers, > but the Windows servers lack that functionality. I thought Cygwin would > help with this. >=20 > How do I tell what the file's encoding is? As pointed out by Gary Johnson, `cat Document.txt` doesn't result in=20 spaced text, it just shows "=FF=FEThis is abc file" (this is using mrxvt an= d=20 Bitstream Vera Sans mono font). Better use the file command to see what it is. And no, there are no=20 converting software that I know of, Cygwin 1.5.x just doesn't support=20 wide characters. --=20 Ren=E9 Berber -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/