X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.5 required=5.0 tests=AWL,BAYES_00,SPF_NEUTRAL,TO_NO_BRKTS_PCNT X-Spam-Check-By: sourceware.org Message-ID: <4E11D6AB.90905@cs.utoronto.ca> Date: Mon, 04 Jul 2011 11:05:15 -0400 From: Ryan Johnson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.18) Gecko/20110616 Lightning/1.0b2 Thunderbird/3.1.11 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: untarring symlinks with ../ fails randomly, silghtly OT References: <1309437783 DOT 2097 DOT 68 DOT camel AT geldmacher-pc> <20110630133703 DOT GE9552 AT calimero DOT vinschen DOT de> <4E0C90B2 DOT 2060409 AT cornell DOT edu> <1309447688 DOT 12904 DOT 21 DOT camel AT geldmacher-pc> <1309770955 DOT 22699 DOT 15 DOT camel AT geldmacher-pc> <20110704104656 DOT GA20822 AT calimero DOT vinschen DOT de> <4E119C61 DOT 7070505 AT cs DOT utoronto DOT ca> <20110704113319 DOT GC20822 AT calimero DOT vinschen DOT de> <4E11B063 DOT 7000808 AT cs DOT utoronto DOT ca> In-Reply-To: <4E11B063.7000808@cs.utoronto.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On 04/07/2011 8:21 AM, Ryan Johnson wrote: > On 04/07/2011 7:33 AM, Corinna Vinschen wrote: >> On Jul 4 06:56, Ryan Johnson wrote: >>> On 04/07/2011 6:46 AM, Corinna Vinschen wrote: >>>> On Jul 4 11:15, Wolf Geldmacher wrote: >>>>> As an aside: >>>>> I also used to have some trouble with "rm -rf" of a directory >>>>> hierarchy failing more or less reproducibly (like: 80% of the >>>>> time) because files were presumably still "in use". Repeating >>>>> the command several times would succeed, though. >>>>> >>>>> Downgrading from cygwin1.dll/1.7.9.1 to cygwin1.dll/1.7.8.1 >>>>> seems to have solved that issue as well - still have to see >>>>> the first "retry to delete". >>>>> >>>>> This may or may not be related to the original report, as it also >>>>> reeks >>>>> of a race condition during file/directory operations. >>>> I can neither reproduce the tar problem, nor can I reprocude the rm >>>> problem. I tried this under 2008R2 which is basically the same as >>>> your >>>> W7-64 bit. I used local and remote drives to test the issue but to no >>>> avail. >>>> >>>> Are you sure this isn't a BLODA problem which is triggered by the >>>> changes in 1.7.9? >>>> >>>> I just took a look through the changes between 1.7.8 and 1.7.9, and >>>> the list of changes which affect filesystem access is pretty small: >>>> >>>> [snip] >>>> >>>> So, is it possible that the request for WRITE_DAC access in the >>>> call to >>>> NtCreateFile triggers some hiccup of your virus checker? It could >>>> easily >>>> explain both effects. >>> I have also seen the rm -rf problem occasionally on my w7-64 >>> machine, and I don't think anything from BLODA is installed. >> Also with 1.7.8? Given the minor number of FS-related changes, it's >> so very unlikely that they would cause a differnce between 1.7.8 and >> 1.7.9. >> >>> However, I haven't noticed the issue since disabling the search >>> indexer on my machine. I did this on the hunch that I often delete >>> large directory trees which aren't very old (e.g. after >>> untar/configure/make of some source package), and that it wouldn't >>> be a big surprise if indexing and cygwin's rm don't mix for whatever >>> reason. >> Hard to imagine that setting the WRITE_DAC flag would interfere with the >> search indexer. On second thought, the flag is only set if a file does >> not exist yet and NtCreateFile gets called to create the file. That >> makes it especially unlikely that this would affect unlinking. >> >> However, given that you can reproduce the issue, could you test the >> scenario again? If the issue occurs, can you disable the following code >> in fhandler.cc and see if it changes anything? >> >> 616 else if (!exists ()&& has_acls ()) >> 617 /* If we are about to create the file and the filesystem supports >> 618 ACLs, we will overwrite the DACL after the call to >> NtCreateFile. >> 619 This requires a handle with additional WRITE_DAC access, >> 620 otherwise set_file_sd has to open the file again. */ >> 621 access |= WRITE_DAC; >> > Sorry, I have no idea which version of the dll I had at the time. It > was at least a month ago, maybe more. > > However, I was wrong about not seeing the problem since. Choosing a > random source dir to blow away: >> $ rm -rf Python-2.6.6 >> rm: cannot remove `Python-2.6.6/Lib/lib2to3/tests': Directory not empty >> $ rm -rf Python-2.6.6 >> $ > > This seems to happen more than half the time (different non-empty dir > every time). Naturally, running under strace makes the problem go away > (it doesn't help that strace kills stderr, where any error messages > might have gone). > > Running the following command 10x: > > $ tar -xaf Python-2.6.6.tar.bz2 && sleep 3 && (rm -rf Python-2.6.6 || > (echo 'Retrying...' && rm -rf Python-2.6.6)) > > I get six times with no error, two times with one error, one time each > with two and three errors. > > I'm currently updating and rebuilding my cygwin sources to try out > your patch... Updated, built, and reproduced, with and without the patch. If anything it's more common in my dev build -- it happened on the first try both times. Any idea of how to debug this? We need some instantaneous version of lsof or something... Ryan -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple