X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=2.3 required=5.0 tests=AWL,BAYES_00,KAM_STOCKTIP,RCVD_IN_DNSWL_NONE X-Spam-Check-By: sourceware.org X-USANET-Source: 165.212.120.254 IN aeolus DOT electric-cloud DOT com AT electric-cloud DOT usa DOT net s1hub2.EXCHPROD.USA.NET X-USANET-MsgId: XID390oHJV3B4731Xo2 From: John Carey To: "cygwin AT cygwin DOT com" Date: Tue, 10 Aug 2010 21:53:46 +0000 Subject: Re: 1.7.5: Occasional failure of CreatePipe or signal handing due to thread-unsafe code in cwdstuff::set Message-ID: <3C031C390CBF1E4A8CE1F74DE7ECAF3A140684F0AA@MBX8.EXCHPROD.USA.NET> Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Aug 10 13:44 +0200 Corinna Vinschen wrote: > > On Aug 10 00:19, John Carey wrote: ... > > Presumably the issues mentioned in the source comment may > > also occur, but what I have seen myself is a subsequent call to > > SetCurrentDirectory() closing, not the current directory handle, but > > the handle that was allocated by the other thread. > >=20 > > Specifically, dll_crt0_1() calls sigproc_init(), then cwdstuff::set(), > > and finally wait_for_sigthread(). The function sigproc_init() creates > > the signal handling thread, which creates the signal handling pipe as > > one of its very first actions, then sets the event for which > > wait_for_sigthread() waits. I think this scenario happens: > >=20 > > 1. main thread: cwdstuff::set(): NtClose(). > >=20 > > 2. signal handling thread: CreatePipe() opens a handle to > > '\Device\NamedPipe\' and stores it in a global variable (because > > this is the first call to CreatePipe()). > >=20 > > 3. main thread: cwdstuff::set(): DuplicateHandle(). > >=20 > > In this case, the current directory handle value has changed, which is > > not the intend of the NtClose-Duplicate sequence. >=20 > No, that's not intended. However, the code just *tries* to preserve the > handle value, but it does not rely on it. The NtClose is safe because > the handle is the actual CWD handle at the time of the call and the PEB > is locked. The DuplicateHandle call just uses the phdl storage to store > the new handle value, but it *never* tests if the new handle value is > identical to the old one. So, if all works well, it's the same handle > value as before. If not, *shrug*. >=20 > > CreateFile to fail with ERROR_INVALID_HANDLE as mentioned in the > > source comments, but I have not seen that. I think that the >=20 > I have. You can easily test this as well on Vista and W7. Just drop > the DuplicateHandle call and simply store h in *phdl. Then create a > testcase: >=20 > chdir ("/"); > h =3D CreateFile ("foobar", ...); > if (h =3D=3D INVALID_HANDLE_VALUE) printf ("%lu\n", GetLastError()); Thanks for the test case for the CreateFile() problem; I used it to create the following test, in which Windows 7 CreateFile() fails with ERROR_INVALID_HANDLE even when using a stock Cygwin 1.7.5 DLL: > build AT aeolus-w764c17 ~ > $ uname -a > CYGWIN_NT-6.1-WOW64 aeolus-w764c17 1.7.5(0.225/5/3) 2010-04-12 19:07 i686= Cygwin >=20 >=20 > build AT aeolus-w764c17 ~ > $ cat test.c > #include > #include > #include > #include > #include >=20 > pthread_mutex_t mutex =3D PTHREAD_MUTEX_INITIALIZER; >=20 > void * run (void *arg) > { > int count =3D (int)arg >> 1; > int sync =3D (int)arg & 1; >=20 > while (count--) { > if (sync) pthread_mutex_lock(&mutex); >=20 > HANDLE h =3D CreateFile ( > "foobar", > GENERIC_WRITE, > 0, > NULL, > CREATE_ALWAYS, > FILE_ATTRIBUTE_NORMAL, > INVALID_HANDLE_VALUE); > if (h =3D=3D INVALID_HANDLE_VALUE) { > printf ("%lu\n", GetLastError()); > exit(1); > } > CloseHandle(h); >=20 > if (sync) pthread_mutex_unlock(&mutex); > } >=20 > return 0; > } >=20 > int main (int argc, char** argv) > { > int count; > int sync; > pthread_t tid; > void *result; >=20 > if (argc !=3D 2 && argc !=3D 3) { > fprintf(stderr, "Usage: %s []\n", argv[0]); > return 2; > } >=20 > count =3D atol (argv[1]); > sync =3D argc >=3D 3 ? !!atoi(argv[2]) : 0; >=20 > pthread_create (&tid, 0, &run, (void*)((count << 1) | sync)); > while (count--) { > if (sync) pthread_mutex_lock(&mutex); > chdir ("/"); > if (sync) pthread_mutex_unlock(&mutex); > } >=20 > pthread_join (tid, &result); > return 0; > } >=20 > build AT aeolus-w764c17 ~ > $ gcc -o ~/test ~/test.c >=20 > build AT aeolus-w764c17 ~ > $ ~/test 1000 1 > 123 >=20 > build AT aeolus-w764c17 ~ > $ ~/test 1000 0 > 123 >=20 > build AT aeolus-w764c17 ~ > $ cd / >=20 > build AT aeolus-w764c17 / > $ ~/test 1000 1 >=20 > build AT aeolus-w764c17 / > $ ~/test 1000 0 > 6 >=20 > build AT aeolus-w764c17 / > $ 6 =3D "The handle is invalid.": I think that this is the error previously discussed. Note that passing "1" as the second program argument synchronizes the two threads and prevents the error. 123 =3D "The filename, directory name, or volume label syntax is incorrect.= ": I have not yet investigated why that error occurs, but it does not happen if I run the test in the Cygwin root directory. ... > > I think this other scenario also happens: > >=20 > > 1. signal handling thread: CreatePipe() opens a handle to > > '\Device\NamedPipe\' (because it is the first call to CreatePipe()). > >=20 > > 2. main thread: cwdstuff::set(): NtClose(). > >=20 > > 3. signal handling thread: CreatePipe() opens pipe handles. > >=20 > > 4. main thread: cwdstuff::set(): DuplicateHandle(). > >=20 > > In this case it is Cygwin signal handling that is sabotaged by > > subsequent calls to SetCurrentDirectory(), because they close one of > > the pipe handles used for Cygwin signal handling. > > This is pure speculation. Please explain to me how DuplicateHandle, > which duplicates a *local* handle of a locally opened directory, which > is stored in the PEB's CurrentDirectoryHandle under RtlAcquirePebLock > conditions, should interfere with CreatePipe or, FWIW, SetCurrentDirector= y. > > The worst thing which could happen is that the handle in > PEB->CurrentDirectoryHandle has another value and a subsequent > CreateFile w/ a relative path fails (but not Cygwin's calls to > NtCreateFile). I don't see any chance that one of the pipe handles is > suddenly stored in phdl =3D=3D the address of the PEB's cwd handle. And,= if > DuplicateHandle fails, the local handle is stored in the PEB directly. > Still, all of that is done under RtlAcquirePebLock condition. > > And then again, if CreatePipe() stores a global handle to the pipe > filesystem (I don't know, I never observed the CreatePipe call more > closely), it will probably store the handle in the PEB. If it does, > it will use RtlAcquirePebLock/RtlReleasePebLock as well. The PEB lock in cwdstuff::set() can only prevent the race if other threads always hold the PEB lock while invoking system calls that allocate handles. CreateFile() is pretty big, so I might have missed it, but I did not see CreateFile() holding the PEB lock while actually creating the new file handle. Instead, after actually creating the new file handle, it called _RtlReleaseRelativeName AT 4 on a reference-counted object that holds a copy of the current directory handle (see below for more about such reference-counted current directory objects). So I think one can substitute CreateFile() for CreatePipe() in step 3 above (provided you delete step 1), in order to trigger a change in the current directory handle value. Now, where is Windows stuffing the extra copy of the directory handle? In those reference-counted heap-allocated directory objects. Stepping through the machine code under Windows 7 I see SetCurrentDirectory() updating a global pointer ("ntdll!RtlpCurDirRef") to one such object, under the protection of a critical section ("ntdll!FastPebLock"). Despite mentioning "Peb" in the name, neither global's address is computed using "FS:". The global pointer points to a heap-allocated block that starts with a reference count followed by a copy of the directory handle value, and apparently includes other data as well. SetCurrentDirectory() swaps in a new such block, and decrements the counter on the old block, and if that reaches 0, closes the handle. Anyway, this block appears to be where the old current directory handle value persists past the changes made by cwdstuff::set(). So, I think what is happening in the test code I gave above is: 1. The main thread starts chdir("/") and gets as far as NtClose() on the old directory handle value, hereafter called OldValue. 2. Inside CreateFile(), the second thread actually makes the system call that opens "foobar". It gets OldValue as the new handle value. 3. The main thread calls DuplicateHandle(), gets another handle value for the current directory, and stores it in the PEB but not the reference-counted current directory object pointed to by "ntdll!RtlpCurDirRef". 4. The second thread calls CreateFile() again, but gets confused because it tries to use OldValue to interpret the relative path "foobar". Now, how we can find the address of "ntdll!RtlpCurDirRef", so that we can update it with the new handle, I am not sure. But does the above adequately explain the difficulty? Thanks for your concern, John [Technical note about this email: it is not threaded properly; hopefully th= at will change in future responses. Sorry for any inconvenience.] -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple