X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Message-Id: From: Denis Excoffier To: cygwin AT cygwin DOT com Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Subject: 1.7: mv: Device or resource busy Mime-Version: 1.0 (Apple Message framework v936) Date: Wed, 28 Oct 2009 00:31:32 +0100 Cc: Denis Excoffier Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Hello, I would like to report a strange behavior. It seems intermittent by nature, but i succeeded to make it constantly reproducible (at least on my PC). So please don't quit. Under some circumstances (detailed below), the command: mv file1 file2 produces mv: cannot move `file1' to `file2': Device or resource busy with a return code 1 (and logically, nothing is done). I can tell in what circumstances the problems does *not* seem to happen: a) when `file1' is with mode 644, the problem never happens (i believe this is 100% true) b) when `file1' does not begin with `MZ' (signature of an executable) the problem never happens (i believe this is 90% true) c) when i use my second Cygwin box (on top of XP SP2) the problem never happens (i believe this is 99% true) d) when `mv file1 file2' has failed (therefore `file1' is still there), a second try will almost always succeed e) when `mv file1 file2' has succeeded (perhaps having failed the first time), if the filename `file1' is reused (with the same or a different content, but still beginning with `MZ'), the next `mv file1 file2' will never fail. However, if the `dirname' of `file1' is emptied, then removed, then mkdir'ed again, the problem will surely happen again (i believe this is 98% true) f) when the file1 is big (eg a copy of /usr/bin/emacs-X11.exe 15Mb), the problem happens almost always; when the file1 is small (eg a copy of /usr/bin/ldh.exe, 1536bytes), the problem happens almost never. I never observed the problem with a `file1' less than 1536 bytes. g) i checked but didn't notice any incidence of: - suffix of `file1' or `file2' (i tried .exe, .xxx, .yyy, with more or less the same results, even with no suffix; i discover now, writing this message, that i didn't test any .dll files, i'll do that tomorrow) - whether `file2' already exists or not - if the `mv' command is called from within /usr/bin/tcsh or from /bin/sh - whether `file1' and `file2' are in the same directory (however, i never tested with `file2' outside the filesystem) An important thing is that i also noticed that the access time of `file1' is always updated in case of mode 755. Why should it be the case? Does it conform to the standards (POSIX?)? To investigate, i tried all the following, with no noticeable improvement: - i removed McAfee Virusscan(8.5i P6): no change - several reboots: no change - i switched back to Cygwin 1.7.0-61: same results - i had a look in the cygwin-1.7.0-62 sources (path.cc, path.h, set_exec(), fhandler_disk_file.cc): - i first suspected NtOpenFile() (line 651 in fhandler_disk_file.cc): if an signedness inconsistency exists in NtOpenFile signature, the NtClose could remain uncalled - i tried to use the ntdll.dll from my second Cygwin box (but i didn't manage to make it work inside my first Cygwin box) - finally, i had a deeper look into the code and found that if _check_for_executable is set to false, the files are not searched into, and i poked byte 0 into cygwin1.dll at the right place: % cmp cygwin1.dll.original cygwin1.dll.poked 1323689 1 0 % This last action with absolutely no change. This disappointed me a lot because i was then absolutely sure that McAfee was the culprit. But since then, i removed McAfee, and the problem is still there... (by the way, how can we print the cwdstuff structure?) The only improvement i got, uninterestingly is by using either: - my second Cygwin box (no error never here, with many tests performed) - Cygwin 1.5.25-15, that i reinstalled on my first Cygwin box (no errors showed up, however with not so many tests performed) How to reproduce if you want to: Use this piece of shell and modify as needed: ------------------------------------------ #!/bin/sh variant="x`date +%M%S`" echo $variant # select one of these origfile="/usr/bin/xpdf.exe" # 1308kb origfile="/usr/bin/banner.exe" # 8kb origfile="/usr/bin/ldh.exe" # 1536b origfile="/usr/bin/xpdf.exe" # 1308kb origfile="/usr/bin/diff.exe" # 105kb origfile="/usr/bin/emacs-X11.exe" # 15Mb file1="${variant}xxx" file2="${variant}yyy" if true; then rm -f ${file1} # to be sure cp ${origfile} ${file1} # don't want to kill your Cygwin binaries #chmod 644 ${file1} # uncommented and mv will *never* fail #date;sleep 3;date # uncommented to check atime update (see `ls') ls -ilu --full-time ${file1} mv ${file1} ${file2} echo "rc=$?" #date;sleep 3;date # uncommented to check atime update (see `ls') ls -ilu --full-time ${file1} ${file2} 2> /dev/null # one is there, one is missing, $rc above says which rm -f ${file2} # kill file2 and the remaining file1's (ie *xxx, see above) will expose the failures fi exit ------------------------------------------ Figures: i just tested with xpdf.exe (1.3Mb), and it failed 19 times out of 20. i just tested with banner.exe (8192b), and it failed 3 times out of 20. i just tested with diff.exe (105kb), and it failed 20 times out of 20. My environment is: I have two Cygwin boxes, the first is a Fujitsu laptop with XP SP3, the second is an HP tower with XP SP2. Both with an NTFS disk (150Gb), all the tests have been performed within this NTFS disk, under (unless otherwise mentioned) Cygwin 1.7.0-62, with all the packages installed. I also never noticed any change in the inodes (see above `ls -i'). My interpretation of the symptoms is as follows (ie if i had to reproduce this behavior inside a program of my own, i would do the following): Let's suppose we only have executable (755) files beginning with MZ, on my first Cygwin box. In my opinion, each slot in a directory would some room for a boolean variable initially set to 0, meaning "this entry is not executable or i don't know". This boolean would be used only in case of a `mv' command which would use this particular directory slot as the first parameter. When the `mv' command is launched, two cases: - if the boolean is set to 1 (meaning: "this entry has been established to be executable"), the command would be performed normally, the file is moved, the directory entry remains with the boolean set to 1, with no file inside (since the `mv' succeeded) - if the boolean is set to 0, two processes would be running concurrently: - the normal process of mv, which (if successful) finally has to rename() file1 - an unknown process which reads the content of file1, updates the access time of file1, turns the boolean into 1, and takes more time if the content of file1 (or the size indicated in the directory slot, who knows?) is large and less time if the content of file1 is small; Also: this unknown process starts after having received the 'y' in case of `mv -i'. If the winner of this race is the normal process of mv, we get: Device or resource busy. If the winner is the unknown process, the mv is performed normally. In any case, the boolean is set to 1. The above mechanism does not exactly seem 100% correct, since we can observe that the access time is also updated when the boolean has previously been set to 1: the unknown process would probably need to be launched at each instance of (this kind of) mv, but must return very quickly if the boolean is already set to 1. Help! How to solve this? How to make my first box behave like the second one (ie never fail)? At least, did you manage to reproduce this? Thank you for your time. Denis Excoffier. P.S. For your information and to be the most comprehensive, at least two classical packages (`tcl8.6b1' and `openssl-1.0.0-beta3') have their `make install' to fail with exactly this error (to be exact, the `make install' from the tcl package does not fail since the error is not caught, but the final copy is not performed). -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple