X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; q=dns; s=default; b=kKcwCKTYNHkyIROH2/SK+FNMm0kyyxWjTTG9YJ4+L+3 14x/ZrRh+rMsFZkXhcnknUDvC6SNMrg4fhgbsj1u/5eF8TgEXC4Vs2LsINgONRkD 0DJ2530hVfI1apMaRjIt5QP4yAzZPQDAa7zyQSxJAij1bYly86V+PnoK2EdKsOQs = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:message-id:date:from:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; s=default; bh=L61lkI/R1WWnSASw51wIdD8U9y0=; b=Ofu5fZNXI9OzJuC6d ZOFV7FGlADr2bWntIoaDD1wjU8UxvT0mJQl0iuH9k6Ou9mjw1I4jTlYbsW0J8cE7 36tDP9Zum7ScPxJ0n1zIz/jM6CXekRYcZOY+5fV2PIauQFj9xaMWch5svU0tj9sr rcIaIf93gAzesvc1pYKxrDPDdc= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 X-HELO: limerock03.mail.cornell.edu X-CornellRouted: This message has been Routed already. Message-ID: <543D6BB9.3030009@cornell.edu> Date: Tue, 14 Oct 2014 14:30:17 -0400 From: Ken Brown User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: cygwin AT cygwin DOT com Subject: Re: Crash in g_file_monitor on 32-bit Cygwin References: <53AB82AB DOT 5000304 AT cornell DOT edu> <53ADA5B5 DOT 10404 AT cornell DOT edu> <53ADAF68 DOT 2020703 AT cygwin DOT com> <53AEA23A DOT 8030306 AT cornell DOT edu> <543D4ED3 DOT 6020605 AT cornell DOT edu> In-Reply-To: <543D4ED3.6020605@cornell.edu> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes On 10/14/2014 12:26 PM, Ken Brown wrote: > On 6/28/2014 7:08 AM, Ken Brown wrote: >> On 6/27/2014 1:52 PM, Yaakov Selkowitz wrote: >>> On 2014-06-27 12:11, Ken Brown wrote: >>>> On 6/25/2014 10:17 PM, Ken Brown wrote: >>>>> This is a followup to >>>>> https://cygwin.com/ml/cygwin/2014-06/msg00324.html, from which I >>>>> extracted the following test case: >>>>> >>>>> $ cat gfile-test.c >>>>> #include >>>>> #include >>>>> >>>>> void >>>>> gfile_add_watch (const char *file) >>>>> { >>>>> GFile *gfile = g_file_new_for_path (file); >>>>> GFileMonitor *monitor; >>>>> GFileMonitorFlags gflags = G_FILE_MONITOR_NONE; >>>>> monitor = g_file_monitor (gfile, gflags, NULL, NULL); >>>>> if (! monitor) >>>>> printf ("Can't watch file %s\n", file); >>>>> else >>>>> printf ("Watching file %s\n", file); >>>>> } >>>>> >>>>> int >>>>> main () >>>>> { >>>>> const char *file = "gfile-test.c"; >>>>> gfile_add_watch (file); >>>>> } >>>>> >>>>> $ gcc -g -O0 -o gfile-test $(pkg-config --cflags gio-2.0) gfile-test.c >>>>> $(pkg-config --libs gio-2.0) >>>>> >>>>> In the 64-bit case, this behaves as expected: >>>>> >>>>> $ ./gfile-test.exe >>>>> Watching file gfile-test.c >>>>> >>>>> In the 32-bit case, however, it crashes. Running it under gdb shows >>>>> that the call to g_file_monitor leads to a SEGV, but I can't tell >>>>> exactly where; when I try to single step through the Glib code, I >>>>> eventually hit an assertion violation in gdb. strace shows lots of >>>>> exceptions, but I can't make much sense out of it otherwise. >>>> >>>> I rebuilt glib and gamin without optimization so that I could step >>>> through the code in gdb. But stepping through the code turned out to be >>>> unnecessary, because the bug was gone after the rebuilds. I don't know >>>> if optimization was really the issue or whether just rebuilding with the >>>> latest tools is what fixed it. >>>> >>>> My builds can be obtained from >>>> >>>> http://sanibeltranquility.com/cygwin/ >>>> >>>> if anyone else wants to try to reproduce this without rebuilding the >>>> packages themselves. >>>> >>>> Yaakov, could you take a look? >>> >>> Sure. Are you narrow this down to only one of glib or gamin? >> >> The culprit is gamin, and optimization *is* relevant. What's strange, though, >> is that when I rebuild it with optimization, my test case hangs instead of >> crashing. Summary: >> >> - With gamin-0.1.10-14 (and its subpackages), my test case crashes. The outward >> symptom is that there's no output, but running the test case under gdb shows the >> SEGV. >> >> - If I rebuild gamin without optimization, I don't see any bug. More precisely, >> I build it using your gamin.cygport with the following line added: >> >> CFLAGS+=" -O0 -g3" >> >> - If I rebuild gamin with optimization (i.e., just using your gamin.cygport with >> no changes), my test case hangs. > > I made another attempt to debug this, and I found the problem, but I don't know > how to fix it. First, I have to correct the last assertion I made above about > my test case hanging; I just didn't wait long enough for it to finish. What > happens is that there is a retry loop in > libgamin/gam_api.c:gamin_connect_unix_socket that gives up after 25 seconds. And > the reason it fails is that /usr/libexec/gam_server.exe has crashed. In fact, > the latter always crashes on 32-bit Cygwin if it's built with optimization and > if the directory /tmp/fam- exists before it is run. [And this > directory will always exist after one run of gam_server.exe.] > > The crash occurs in a call to g_free at server/gam_channel.c:525 because the > pointer 'dir' that is being freed has been clobbered by a call to > gam_check_not_fat on line 497. Here are some details, based on a build using > Yaakov's gamin.cygport file with the added line > > CFLAGS+=" -O1 -g3" > > I've appended at the end of this message a transcript of a gdb session that > illustrates some of the assertions I'll be making. > > At line 447 of server/gam_channel.c, g_strconcat is called to get a pointer to > the directory name "/tmp/fam-". The value of this pointer is assigned > to the variable 'dir' at line 473, and in my run it is 0x8005c068. Although > 'dir' is optimized out, I can see from a disassembly that the pointer is stored > on the stack at -0x510(%ebp): > > 0x004058fc <+266>: call 0x408bf8 > 0x00405901 <+271>: mov %eax,-0x510(%ebp) > > And I verified in my gdb session that this stack location does indeed contain > 0x8005c068. After the call to gam_check_not_fat a little later, that stack > location contains the value 0x00000104. Then when g_free attempts to free the > bogus pointer 0x00000104, we get a crash. > > I can't tell from the disassembly why the call to gam_check_not_fat clobbers the > stack. My best guess is that it happens as a result of calls to some Windows > functions. I hope someone more knowledgeable can take this further and fix it. I stepped into gam_check_not_fat (which I should have done to begin with) and narrowed this down further. The stack location in question gets clobbered by the call to GetVolumeInformation: (gdb) s gam_check_not_fat (path=0x8005c068 "/tmp/fam-kbrown") at /usr/src/debug/gamin-0.1.10-16/server/gam_channel.c:35 35 cygwin_conv_path(CCP_POSIX_TO_WIN_A, path, winpath, MAX_PATH); (gdb) x/x $ebp-0x510 0x28a6a8: 0x8005c068 (gdb) n 37 pGVPN = GetProcAddress(LoadLibrary("kernel32"), "GetVolumePathNameA"); (gdb) x/x $ebp-0x510 0x28a6a8: 0x8005c068 (gdb) n 38 if (!pGVPN || !(pGVPN)(winpath, root, MAX_PATH)) (gdb) x/x $ebp-0x510 0x28a6a8: 0x8005c068 (gdb) n 52 if (!GetVolumeInformation (root, volname, MAX_PATH, NULL, (gdb) x/x $ebp-0x510 0x28a6a8: 0x8005c068 (gdb) n 58 if (!strncmp(fsname, "FAT", 3)) /* FAT, FAT32 */ (gdb) x/x $ebp-0x510 0x28a6a8: 0x00000104 Here's the code near the call to GetVolumeInformation, followed by what I think is the relevant disassembly: if (!GetVolumeInformation (root, volname, MAX_PATH, NULL, NULL, NULL, fsname, MAX_PATH)) { fprintf (stderr, "GetVolumeInformation: %d\n", GetLastError ()); return 0; } 0x00405b3a <+840>: movl $0x104,0x1c(%esp) <<<<<<<<<<<<<<<< 0x00405b42 <+848>: lea -0x120(%ebp),%eax 0x00405b48 <+854>: mov %eax,0x18(%esp) 0x00405b4c <+858>: movl $0x0,0x14(%esp) 0x00405b54 <+866>: movl $0x0,0x10(%esp) 0x00405b5c <+874>: movl $0x0,0xc(%esp) 0x00405b64 <+882>: movl $0x104,0x8(%esp) <<<<<<<<<<<<<<<< 0x00405b6c <+890>: lea -0x224(%ebp),%eax 0x00405b72 <+896>: mov %eax,0x4(%esp) 0x00405b76 <+900>: lea -0x328(%ebp),%eax 0x00405b7c <+906>: mov %eax,(%esp) 0x00405b7f <+909>: call *0x41248c <----- GetVolumeInformation? 0x00405b85 <+915>: sub $0x20,%esp 0x00405b88 <+918>: test %eax,%eax 0x00405b8a <+920>: jne 0x405bb5 0x00405b8c <+922>: call *0x412480 <----- GetLastError? 0x00405b92 <+928>: mov %eax,%esi 0x00405b94 <+930>: call 0x408df0 <__getreent> 0x00405b99 <+935>: mov %esi,0x8(%esp) 0x00405b9d <+939>: movl $0x40c70f,0x4(%esp) 0x00405ba5 <+947>: mov 0xc(%eax),%eax 0x00405ba8 <+950>: mov %eax,(%esp) 0x00405bab <+953>: call 0x408df8 0x00405bb0 <+958>: jmp 0x406073 Note the two marked movl instructions involving 0x104; I guess one of these is the culprit, but I don't really know what's going on. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple