Mail Archives: cygwin/2008/06/10/21:53:37
gmarsha11 wrote:
> I'm not sure about the file's encoding. How do I tell?
If you have "file" installed, its easy:
$ file Document.txt
Document.txt: Unicode text, UTF-16, little-endian
> When I create a new file with vi, I can read the file with no problem. T=
he
> output is normal.
Look at the bottom line, vi tells you what kind of "text" it is... sort of:
"Document.txt" [converted][dos] 1L, 20C
The "converted" means it wasn't regular text, the "dos" means it has=20
CR-LF line endings.
If you like to look at what it really is, try:
$ od -tx2z Document.txt
0000000 feff 0054 0068 0069 0073 0020 0069 0073 >..T.h.i.s. .i.s.<
0000020 0020 0061 0062 0063 0020 0066 0069 006c > .a.b.c. .f.i.l.<
0000040 0065 000d 000a >e.....<
0000046
So your spaces are really null bytes (some fonts put little smileys), vi=20
was wrong no CR in there.
> These particular text files that I am working with were created by HP Data
> Protector. I can easily parse and manipulate these files on HPUX servers,
> but the Windows servers lack that functionality. I thought Cygwin would
> help with this.
>=20
> How do I tell what the file's encoding is?
As pointed out by Gary Johnson, `cat Document.txt` doesn't result in=20
spaced text, it just shows "=FF=FEThis is abc file" (this is using mrxvt an=
d=20
Bitstream Vera Sans mono font).
Better use the file command to see what it is. And no, there are no=20
converting software that I know of, Cygwin 1.5.x just doesn't support=20
wide characters.
--=20
Ren=E9 Berber
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -