delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2008/06/10/22:05:13

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Message-ID: <484F32BD.E4BBC57A@dessent.net>
Date: Tue, 10 Jun 2008 19:04:45 -0700
From: Brian Dessent <brian AT dessent DOT net>
X-Mailer: Mozilla 4.79 [en] (Windows NT 5.0; U)
MIME-Version: 1.0
To: =?iso-8859-1?Q?Ren=E9?= Berber <r DOT berber AT computer DOT org>
CC: cygwin AT cygwin DOT com
Subject: Re: Extra spaces in text files in cygwin
References: <17764646 DOT post AT talk DOT nabble DOT com> <484EFB14 DOT 65C9E56F AT dessent DOT net> <17766865 DOT post AT talk DOT nabble DOT com> <g2nb5n$bd2$1 AT ger DOT gmane DOT org>
X-IsSubscribed: yes
Reply-To: cygwin AT cygwin DOT com
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id m5B25B9o000802

René Berber wrote:

> If you like to look at what it really is, try:
> 
> $ od -tx2z Document.txt
> 0000000 feff 0054 0068 0069 0073 0020 0069 0073  >..T.h.i.s. .i.s.<
> 0000020 0020 0061 0062 0063 0020 0066 0069 006c  > .a.b.c. .f.i.l.<
> 0000040 0065 000d 000a                           >e.....<
> 0000046
> 
> So your spaces are really null bytes (some fonts put little smileys), vi
> was wrong no CR in there.

Sure there is, 000d 000a is \r \n in UTF-16.

> As pointed out by Gary Johnson, `cat Document.txt` doesn't result in
> spaced text, it just shows "˙ŝThis is abc file" (this is using mrxvt and
> Bitstream Vera Sans mono font).

Those NUL bytes are still being printed, it's just that that your
particular combination of terminal and font doesn't show anything for
them; but they're still there in the output stream.

> Better use the file command to see what it is.  And no, there are no
> converting software that I know of, Cygwin 1.5.x just doesn't support
> wide characters.

Sure there is: iconv.

And this is not a matter of Cygwin supporting or not supporting
something -- that would be true if we were talking about wide characters
in the filenames.  But we're talking about the file's contents, and what
an app does with the bytes is up to it, not Cygwin.  For example, vi is
a Cygwin app and can read the UTF-16 file just fine, displaying the
characters as ascii.  So in this case it depends on the app, not the
libc.  And as already stated, the Unix tradition is to use UTF-8 since
it fits into the "a string is a null-terminated series of bytes"
definition that is borrowed from C.  But anyway you can freely transform
anything to anything with iconv.

Brian

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/


- Raw text -


  webmaster     delorie software   privacy  
  Copyright İ 2019   by DJ Delorie     Updated Jul 2019