Mail Archives: cygwin/2008/06/10/22:05:13
René Berber wrote:
> If you like to look at what it really is, try:
>
> $ od -tx2z Document.txt
> 0000000 feff 0054 0068 0069 0073 0020 0069 0073 >..T.h.i.s. .i.s.<
> 0000020 0020 0061 0062 0063 0020 0066 0069 006c > .a.b.c. .f.i.l.<
> 0000040 0065 000d 000a >e.....<
> 0000046
>
> So your spaces are really null bytes (some fonts put little smileys), vi
> was wrong no CR in there.
Sure there is, 000d 000a is \r \n in UTF-16.
> As pointed out by Gary Johnson, `cat Document.txt` doesn't result in
> spaced text, it just shows "˙ŝThis is abc file" (this is using mrxvt and
> Bitstream Vera Sans mono font).
Those NUL bytes are still being printed, it's just that that your
particular combination of terminal and font doesn't show anything for
them; but they're still there in the output stream.
> Better use the file command to see what it is. And no, there are no
> converting software that I know of, Cygwin 1.5.x just doesn't support
> wide characters.
Sure there is: iconv.
And this is not a matter of Cygwin supporting or not supporting
something -- that would be true if we were talking about wide characters
in the filenames. But we're talking about the file's contents, and what
an app does with the bytes is up to it, not Cygwin. For example, vi is
a Cygwin app and can read the UTF-16 file just fine, displaying the
characters as ascii. So in this case it depends on the app, not the
libc. And as already stated, the Unix tradition is to use UTF-8 since
it fits into the "a string is a null-terminated series of bytes"
definition that is borrowed from C. But anyway you can freely transform
anything to anything with iconv.
Brian
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -