Mail Archives: cygwin/2008/06/10/21:18:27
On 2008-06-10, gmarsha11 wrote:
> Ok, have saved the file with Windows notepad as ANSI, Unicode, Unicode big
> endian, and UTF-8.
>
> Both Unicode options give me the output with the extra spaces. ANSI and
> UTF-8 allow me to see the files as I would expect to see them.
>
> Does this mean it's necessary to change the encoding for any files I might
> need to cat, grep awk, etc.?
I'm no expert on any of this, but as far as I know, all traditional
Unix tools that deal with strings consider a string to be a sequence
of 8-bit characters. So the simple answer is yes. The more
complete answer is that it depends on what you're using those files
for and what other programs need to read and/or write those files.
FWIW, I used Notepad on my Windows XP system to create a file
containing your string, "This is abc file". When I went to save it,
the Encoding was already set to ANSI. In other words, you shouldn't
have to do anything special to save your files in a format already
compatible with grep, etc.
That being said, you really shouldn't use Notepad to edit any files
you expect to use with Cygwin, because Cygwin tools expect lines to
end with LF, not a CR-LF pair. Many tools will consider that CR to
be part of the line. In particular, bash will give odd results if
you ask it to execute a shell script written with Notepad.
I got different results than you did when I cat'd abc.txt. When I
saved it as Unicode, the output of cat was:
ÿþThis is abc file
When I saved it as Unicode Big Endian, the output of cat was:
þÿThis is abc file
The only difference between the two was the ordering of the bytes in
the BOM (Byte Order Mark) at the beginning of each file. In both
cases, there were no extra spaces. I was running bash in an rxvt
window, if that matters.
Regards,
Gary
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -