X-Recipient: archive-cygwin@delorie.com
X-SWARE-Spam-Status: No, hits=-2.1 required=5.0	tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,T_TO_NO_BRKTS_FREEMAIL
X-Spam-Check-By: sourceware.org
MIME-Version: 1.0
In-Reply-To: <fbacfb31272a0.4d20da45@shaw.ca>
References: <fbacfb31272a0.4d20da45@shaw.ca>
Date: Mon, 3 Jan 2011 22:36:57 +0000
Message-ID: <AANLkTikYOqgLv15FT4erHoPia3H-Jn8GPzQ6fbHeEFSS@mail.gmail.com>
Subject: Re: File output Question
From: Andy Koppe <andy.koppe@gmail.com>
To: cygwin@cygwin.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-IsSubscribed: yes
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie.com@cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Delivered-To: mailing list cygwin@cygwin.com

On 3 January 2011 03:04, ERIC HO wrote:
> Looking at this closer, I've found the file contains hex '00' between cha=
racters. In vim, it displays ^@ for hex '00'. So it looks unreadable. But w=
hen I use cat or tail commands, it strips off hex '00' and looks readable. =
For example. in vim, it displays:
>
> P^@r^@o^@d^@u^@c^@t^@:
> With cat and tail, the output displays Product:

Apparently the file is encoded in little-endian UTF-16, which is
Windows' favoured Unicode encoding. That effectively inserts a NUL
byte after each ASCII byte. The ^@ is caret notation for NUL.

When sending the file directly to the terminal with cat or tail, the
terminal will simply ignore the NUL bytes, whereas vim makes an
attempt to show you the whole content of the file. UTF-16 files ought
to have the so-called Byte Order Mark (BOM) for indicating
little-endian or big-endian at the start, and vim automatically
recognises a file as UTF-16 when that's there. Your file doesn't
appear to have that though, so vim interprets it as ASCII instead.

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

