Mail Archives: cygwin/2009/09/10/15:31:29
After a few problems with monotone's unit tests on Cygwin-1.7, I began
searching and experimenting a bit with new 1.7 support for wide chars.
I also read the full thread about its last change:
http://www.cygwin.com/ml/cygwin/2009-05/msg00344.html
which really makes some sense to me (when I create a file from the
console I want "ls" to show back that file to me with same encoding).
Problem is, that unit test assumes filenames are "raw data" and tries to
create three types of filenames: ISO-8859-1, EUC-JP and UTF-8.
Except on OSX where it only tries UTF-8 as that's the disk format.
Now we have an UTF-16 disk format, except the library is using
LANG-value-from-process-start to initialize some LANG-to-UTF16
conversion as far as I understoof so there's not really one "correct"
format: it depends on the LANG env value when the test unit is launched.
OK, that's a side issue since I can probably modify the tests to always
be launched with LANG=C instead of using the current value so that at
least it is consitent. And then maybe remove the creation of ISO-8859-1
and EUC-JP tests just like on OSX. Which could be correct... but a bit
less so than on OSX itself, when that is really "the format" and not the
"the DEFAULT format which could be overridden with a correct setlocale".
But the real problem with that test is not really what shows and how,
the biggest problem is that it seems that filenames created with a
"wrong" filename are quite limited in usage and can't seemingly be deleted.
% export LANG=en_EN.UTF-8
% cat t.c
#include <stdio.h>
int main() {
fopen("a-\xF6\xE4\xFC\xDF", "w"); //ISO-8859-1
fopen("b-\xC3\xB6\xC3\xA4\xC3\xBc\xC3\x9F", "w"); //UTF-8
return 0;
}
% gcc -o t t.c
% mkdir test ; cd test ; ../t ; cd ..
% ls -l test
ls: cannot access test/a-▒▒▒: No such file or directory
total 0
-????????? ? ? ? ? ? a-▒▒▒
-rw-r--r-- 1 lapo None 0 2009-09-10 21:19 b-öäüß
% find test
test
test/a-???
test/b-öäüß
% find test -delete
find: cannot delete `test/a-\366\344\374': No such file or directory
find: cannot delete `test': Directory not empty
% find test
test
test/a-???
Now... I don't know how exactly `find` works but it seems strange to me
it isn't capable of deleting something it is capable of listing.
Also seems strange `ls` is not capable of stat-ing something it's
capable of listing.
Yep, I do know that filename is "broken" in the first place, but since
in the Unix world such stuff can happen as filenames are really raw
data, I think probably an error on file creation would be better than
creating a file that can't be consequently stat-ed or even unlinked.
% cat u.c
#include <stdio.h>
int main() {
remove("a-\xF6\xE4\xFC\xDF");
remove("b-\xC3\xB6\xC3\xA4\xC3\xBc\xC3\x9F");
return 0;
}
% gcc -o u u.c
OK, a program using a similarly-broken filename can delete it, but the
fact it can't be deleted with "normal" tools is a bit of an inconvenience...
--
Lapo Luchini - http://lapo.it/
“Premature optimisation is the root of all evil in programming.” (C. A.
R. Hoare)
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -