delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2008/06/10/21:53:37

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: =?ISO-8859-1?Q?Ren=E9_Berber?= <r DOT berber AT computer DOT org>
Subject: Re: Extra spaces in text files in cygwin
Date: Tue, 10 Jun 2008 20:52:55 -0500
Lines: 53
Message-ID: <g2nb5n$bd2$1@ger.gmane.org>
References: <17764646 DOT post AT talk DOT nabble DOT com> <484EFB14 DOT 65C9E56F AT dessent DOT net> <17766865 DOT post AT talk DOT nabble DOT com>
Mime-Version: 1.0
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
In-Reply-To: <17766865.post@talk.nabble.com>
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

gmarsha11 wrote:

> I'm not sure about the file's encoding.  How do I tell?

If you have "file" installed, its easy:

$ file Document.txt
Document.txt: Unicode text, UTF-16, little-endian

> When I create a new file with vi, I can read the file with no problem.  T=
he
> output is normal.

Look at the bottom line, vi tells you what kind of "text" it is... sort of:

"Document.txt" [converted][dos] 1L, 20C

The "converted" means it wasn't regular text, the "dos" means it has=20
CR-LF line endings.

If you like to look at what it really is, try:

$ od -tx2z Document.txt
0000000 feff 0054 0068 0069 0073 0020 0069 0073  >..T.h.i.s. .i.s.<
0000020 0020 0061 0062 0063 0020 0066 0069 006c  > .a.b.c. .f.i.l.<
0000040 0065 000d 000a                           >e.....<
0000046

So your spaces are really null bytes (some fonts put little smileys), vi=20
was wrong no CR in there.

> These particular text files that I am working with were created by HP Data
> Protector.  I can easily parse and manipulate these files on HPUX servers,
> but the Windows servers lack that functionality.  I thought Cygwin would
> help with this.
>=20
> How do I tell what the file's encoding is?

As pointed out by Gary Johnson, `cat Document.txt` doesn't result in=20
spaced text, it just shows "=FF=FEThis is abc file" (this is using mrxvt an=
d=20
Bitstream Vera Sans mono font).

Better use the file command to see what it is.  And no, there are no=20
converting software that I know of, Cygwin 1.5.x just doesn't support=20
wide characters.
--=20
Ren=E9 Berber


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019