X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org From: Jim Meyering To: Eric Blake Cc: 9813-done AT debbugs DOT gnu DOT org, bug-gnulib Cc: cygwin AT cygwin DOT com Subject: Re: rm -rf calls rmdir() prior to close(), which can fail In-Reply-To: <4EA05C9C.1070809@redhat.com> (Eric Blake's message of "Thu, 20 Oct 2011 11:38:36 -0600") References: <4EA05C9C DOT 1070809 AT redhat DOT com> Date: Sun, 23 Oct 2011 22:53:34 +0200 Message-ID: <87ipnfe8fl.fsf@rho.meyering.net> Lines: 190 MIME-Version: 1.0 Content-Type: text/plain Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Eric Blake wrote: > POSIX is clear that attempts to rmdir() a directory that still has > open descriptors may fail. Of course, on Linux, this (rather > limiting) restriction is not present, so we don't notice it; but on > Cygwin, there are certain file systems where this is a real problem, > such as in this thread: > http://cygwin.com/ml/cygwin/2011-10/msg00365.html > > Looking at an strace on Linux reveals the problem (abbreviated to show > highlights here): > > $ mkdir -p a/b > $ strace rm -f a > ... > openat(AT_FDCWD, "a", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3 > ... > fcntl(3, F_DUPFD, 3) = 4 > ... > close(3) = 0 > ... > openat(4, "b", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3 > ... > fcntl(3, F_DUPFD, 3) = 5 > ... > close(3) = 0 > close(5) = 0 > unlinkat(4, "b", AT_REMOVEDIR) = 0 > unlinkat(AT_FDCWD, "a", AT_REMOVEDIR) = 0 > close(4) = 0 > > Notice that for subdirectories, we opened the directory, then used dup > to have a handle for use in further *at calls, then do > fdopendir/readdir/closedir on the DIR*, then close the duplicate fd, > all before calling unlinkat (aka rmdir) on that subdirectory. But for > the top-level directory, the dup'd fd (4) is still open when we > attempt the unlinkat. Thanks for the analysis, Eric. That was due to a rather subtle but easy/safe-to-fix bug. While the rm from coreutils-8.14 worked as your strace above shows, the fixed one does this: (note how the close(4) now precedes the removal of "a") $ mkdir -p a/b $ strace -e openat,close,unlinkat ./rm -rf a close(3) = 0 close(3) = 0 openat(AT_FDCWD, "a", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3 close(3) = 0 openat(4, "b", O_RDONLY|O_NOCTTY|O_NONBLOCK|O_DIRECTORY|O_NOFOLLOW) = 3 close(3) = 0 close(5) = 0 unlinkat(4, "b", AT_REMOVEDIR) = 0 close(4) = 0 unlinkat(AT_FDCWD, "a", AT_REMOVEDIR) = 0 close(0) = 0 close(1) = 0 close(2) = 0 Here is the patch that I expect to push tomorrow: From a11c49cd72a91c05a272e36ff5d3cd92675cbfb5 Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Sun, 23 Oct 2011 22:42:25 +0200 Subject: [PATCH] fts: close parent dir FD before returning from post-traversal fts_read The problem: the fts-using "rm -rf A/B/" would attempt to unlink A, while a file descriptor open on A remained. This is suboptimal (holding a file descriptor open longer than needed) on Linux, but otherwise not a problem. However, on Cygwin with certain file system types, (see http://cygwin.com/ml/cygwin/2011-10/msg00365.html), that represents a real problem: it causes the removal of A to fail with e.g., "rm: cannot remove `A': Device or resource busy" fts visits each directory twice and keeps a cache (fts_fd_ring) of directory file descriptors. After completing the final, FTS_DP, visit of a directory, RESTORE_INITIAL_CWD intended to clear the FD cache, but then proceeded to add a new FD to it via the subsequent FCHDIR (which calls cwd_advance_fd and i_ring_push). Before, the final file descriptor would be closed only via fts_close's call to fd_ring_clear. Now, it is usually closed earlier, via the final FTS_DP-returning fts_read call. * lib/fts.c (restore_initial_cwd): New function, converted from the macro. Call fd_ring_clear *after* FCHDIR, not before it. Update callers. Reported by Franz Sirl via the above URL, with analysis by Eric Blake in http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/28739 --- ChangeLog | 25 +++++++++++++++++++++++++ lib/fts.c | 23 +++++++++++++++-------- 2 files changed, 40 insertions(+), 8 deletions(-) diff --git a/ChangeLog b/ChangeLog index 93ee45e..3a2d2cc 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,28 @@ +2011-10-23 Jim Meyering + + fts: close parent dir FD before returning from post-traversal fts_read + The problem: the fts-using "rm -rf A/B/" would attempt to unlink A, + while a file descriptor open on A remained. This is suboptimal + (holding a file descriptor open longer than needed) on Linux, but + otherwise not a problem. However, on Cygwin with certain file system + types, (see http://cygwin.com/ml/cygwin/2011-10/msg00365.html), that + represents a real problem: it causes the removal of A to fail with + e.g., "rm: cannot remove `A': Device or resource busy" + + fts visits each directory twice and keeps a cache (fts_fd_ring) of + directory file descriptors. After completing the final, FTS_DP, + visit of a directory, RESTORE_INITIAL_CWD intended to clear the FD + cache, but then proceeded to add a new FD to it via the subsequent + FCHDIR (which calls cwd_advance_fd and i_ring_push). Before, the + final file descriptor would be closed only via fts_close's call to + fd_ring_clear. Now, it is usually closed earlier, via the final + FTS_DP-returning fts_read call. + * lib/fts.c (restore_initial_cwd): New function, converted from + the macro. Call fd_ring_clear *after* FCHDIR, not before it. + Update callers. + Reported by Franz Sirl via the above URL, with analysis by Eric Blake + in http://thread.gmane.org/gmane.comp.lib.gnulib.bugs/28739 + 2011-10-23 Gary V. Vaughan Bruno Haible Jim Meyering diff --git a/lib/fts.c b/lib/fts.c index e3829f3..f61a91e 100644 --- a/lib/fts.c +++ b/lib/fts.c @@ -229,11 +229,6 @@ static int fts_safe_changedir (FTS *, FTSENT *, int, const char *) #define ISSET(opt) (sp->fts_options & (opt)) #define SET(opt) (sp->fts_options |= (opt)) -/* FIXME: make this a function */ -#define RESTORE_INITIAL_CWD(sp) \ - (fd_ring_clear (&((sp)->fts_fd_ring)), \ - FCHDIR ((sp), (ISSET (FTS_CWDFD) ? AT_FDCWD : (sp)->fts_rfd))) - /* FIXME: FTS_NOCHDIR is now misnamed. Call it FTS_USE_FULL_RELATIVE_FILE_NAMES instead. */ #define FCHDIR(sp, fd) \ @@ -349,6 +344,18 @@ cwd_advance_fd (FTS *sp, int fd, bool chdir_down_one) sp->fts_cwd_fd = fd; } +/* Restore the initial, pre-traversal, "working directory". + In FTS_CWDFD mode, we merely call cwd_advance_fd, otherwise, + we may actually change the working directory. + Return 0 upon success. Upon failure, set errno and return nonzero. */ +static int +restore_initial_cwd (FTS *sp) +{ + int fail = FCHDIR (sp, ISSET (FTS_CWDFD) ? AT_FDCWD : sp->fts_rfd); + fd_ring_clear (&(sp->fts_fd_ring)); + return fail; +} + /* Open the directory DIR if possible, and return a file descriptor. Return -1 and set errno on failure. It doesn't matter whether the file descriptor has read or write access. */ @@ -948,7 +955,7 @@ next: tmp = p; * root. */ if (p->fts_level == FTS_ROOTLEVEL) { - if (RESTORE_INITIAL_CWD(sp)) { + if (restore_initial_cwd(sp)) { SET(FTS_STOP); return (NULL); } @@ -1055,7 +1062,7 @@ cd_dot_dot: * one level, via "..". */ if (p->fts_level == FTS_ROOTLEVEL) { - if (RESTORE_INITIAL_CWD(sp)) { + if (restore_initial_cwd(sp)) { p->fts_errno = errno; SET(FTS_STOP); } @@ -1579,7 +1586,7 @@ mem1: saved_errno = errno; */ if (!continue_readdir && descend && (type == BCHILD || !nitems) && (cur->fts_level == FTS_ROOTLEVEL - ? RESTORE_INITIAL_CWD(sp) + ? restore_initial_cwd(sp) : fts_safe_changedir(sp, cur->fts_parent, -1, ".."))) { cur->fts_info = FTS_ERR; SET(FTS_STOP); -- 1.7.7.419.g87009 -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple