delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2025/07/19/11:18:29

DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 56JFISu1550575
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 56JFISu1550575
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=DXxakMRY
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4CCF4385694A
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1752938306;
bh=fIZFOEWpEBP8r5EftUfLOX/VJPM0KlQz69xiQT+VKeY=;
h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=DXxakMRYO5kjvAu2XX+IoulkBGSOIc0HzwnDpY1UmyNJzzE3YoXEvEdB/9EA+LlFZ
yuJ6LXtUZNdhKprphC1m1HmUNBMXNK929vZ97kWfg8n1tnNOYz5as3qoNIyvFCJRGe
whedc9ZA0Mj6m39rMrMaGTX0rt7h7lUaai5SQ2aU=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0825F385697E
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0825F385697E
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1752938209; cv=none;
b=hexzq6wdze0uVazxWwWx4PCfD1ox1w4IgfBhtk6VIX1NeEbm5W+/sxcNsa0IP+witNBdr+77sCRLBFa6xEDvtcEQH5+BepavITJdaNeCJKiTONZsqYtm5UquOX0GkvRiK35+aLxeFWFuF02Bo1WbMrgAPmdNzJgu8st6AWWiyd0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1752938209; c=relaxed/simple;
bh=iegqbj/nBHlcPS1eTlWXXWggTWn5fmQOsD2NAOSpydI=;
h=Date:From:To:Subject:Message-Id:Mime-Version:DKIM-Signature;
b=ihDqZA8nrLSVZNNp+NYPZClNA80NeQgEv+83d3aZgCe4ioSkU1kMgXU4P3bwhwZp1Y8kZ9H6XX4VppGepbs0QMK7uxQtCYITeD83HmE6J5aXfrUHGTC/Uippk8nLBaDwUcSFPFk0d9URDUgR14y8UHUJT4OjFacm8/Msy80Rggo=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0825F385697E
Date: Sun, 20 Jul 2025 00:16:46 +0900
To: cygwin AT cygwin DOT com
Subject: Re: Calling system() in multi-threads.
Message-Id: <20250720001646.ada5dee4322f8b8f6e005deb@nifty.ne.jp>
In-Reply-To: <20250718223201.421220a18e7e2f7049634071@nifty.ne.jp>
References: <20250617215411 DOT ebf69d1c18b55191a1b76c01 AT nifty DOT ne DOT jp>
<eb7786ee-dc7a-4689-9d17-b842e581d7c7 AT maxrnd DOT com>
<20250618203127 DOT 71ac180de11230a9a6055185 AT nifty DOT ne DOT jp>
<20250716235236 DOT 96055ec145d9a0528b50c357 AT nifty DOT ne DOT jp>
<aHfHCqD2xZcdyu7u AT calimero DOT vinschen DOT de>
<20250717231421 DOT 56b54f7e96266311101d4c08 AT nifty DOT ne DOT jp>
<aHkUldQHKjA-lZrw AT calimero DOT vinschen DOT de>
<20250718004446 DOT 9ce9f7f208566ded1a676fd5 AT nifty DOT ne DOT jp>
<20250718012800 DOT 45ae0a80ff8bcd286c176baa AT nifty DOT ne DOT jp>
<aHn9rNKV85wagDCC AT calimero DOT vinschen DOT de>
<20250718213152 DOT d9ca638eab71395709d6f138 AT nifty DOT ne DOT jp>
<20250718223201 DOT 421220a18e7e2f7049634071 AT nifty DOT ne DOT jp>
X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
Mime-Version: 1.0
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Takashi Yano via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 56JFISu1550575

On Fri, 18 Jul 2025 22:32:01 +0900
Takashi Yano wrote:
> On Fri, 18 Jul 2025 21:31:52 +0900
> Takashi Yano wrote:
> > On Fri, 18 Jul 2025 09:54:20 +0200
> > Corinna Vinschen wrote:
> > > On Jul 18 01:28, Takashi Yano via Cygwin wrote:
> > > > On Fri, 18 Jul 2025 00:44:46 +0900
> > > > Takashi Yano wrote:
> > > > > On Thu, 17 Jul 2025 17:19:49 +0200
> > > > > Corinna Vinschen wrote:
> > > > > > On Jul 17 23:14, Takashi Yano via Cygwin wrote:
> > > > > > > Hi Corinna,
> > > > > > > 
> > > > > > > On Wed, 16 Jul 2025 17:36:42 +0200
> > > > > > > Corinna Vinschen wrote:
> > > > > > > > On Jul 16 23:52, Takashi Yano via Cygwin wrote:
> > > > > > > > > Do you have any idea?
> > > > > > > > 
> > > > > > > > Locking would be super-simple.
> > > > > > > > 
> > > > > > > > But theoretically it should be possible to use a local child_info_spawn
> > > > > > > > variable at this point.  The ch_spawn child_info_spawn instance is not
> > > > > > > > copied to the child anyway, so that should be safe.  The same goes for
> > > > > > > > posix_spawn() then, btw.
> > > > > > > > 
> > > > > > > > I checked the sources and I don't see any dependency to ch_spawn
> > > > > > > > from a spawning process, in contrast to an exec'ing process.  That
> > > > > > > > doesn't mean there is none, just that I didn't find any.
> > > > > > > 
> > > > > > > Thanks!
> > > > > > > As a starting point, I tried tntroducing locking. It almost works
> > > > > > > as expected, however, sometimes my STC in my first report is hangs
> > > > > > > if N is large e.g. 100. The patch is as attached.
> > > > > > > 
> > > > > > > What am I missing?
> > > > > > 
> > > > > > I don't know.  You're perhaps not releasing the lock in all cases.
> > > > > > But I would have to debug this just like you ¯\_(ツ)_/¯
> > > > > > 
> > > > > > Out of curiosity, did you try using a locale child_info_spawn instance
> > > > > > instead?  That would be a rather nice solution, but I'm pretty sure
> > > > > > there's some other problem lurking in the dark...
> > > > > 
> > > > > I'm not sure what to do with local child_info_spawn.
> > > > > Some of other modules refer to ch_spawn, such as exception.cc and
> > > > > pinfo.cc. Also, has_execed* uses ch_spawn. What should we do for that?
> > > > > 
> > > > > I've just tried simply the following patch, however, this also hangs
> > > > > with my STC.
> > > > > 
> > > > > diff --git a/winsup/cygwin/spawn.cc b/winsup/cygwin/spawn.cc
> > > > > index cb58b6eed..56fca6e45 100644
> > > > > --- a/winsup/cygwin/spawn.cc
> > > > > +++ b/winsup/cygwin/spawn.cc
> > > > > @@ -944,6 +944,7 @@ spawnve (int mode, const char *path, const char *const *argv,
> > > > >    int ret;
> > > > >  
> > > > >    syscall_printf ("spawnve (%s, %s, %p)", path, argv[0], envp);
> > > > > +  child_info_spawn ch_spawn_local;
> > > > >  
> > > > >    if (!envp)
> > > > >      envp = empty_env;
> > > > > @@ -951,7 +952,7 @@ spawnve (int mode, const char *path, const char *const *argv,
> > > > >    switch (_P_MODE (mode))
> > > > >      {
> > > > >      case _P_OVERLAY:
> > > > > -      ch_spawn.worker (path, argv, envp, mode);
> > > > > +      ch_spawn_local.worker (path, argv, envp, mode);
> > > > >        /* Errno should be set by worker.  */
> > > > >        ret = -1;
> > > > >        break;
> > > > > @@ -961,7 +962,7 @@ spawnve (int mode, const char *path, const char *const *argv,
> > > > >      case _P_WAIT:
> > > > >      case _P_DETACH:
> > > > >      case _P_SYSTEM:
> > > > > -      ret = ch_spawn.worker (path, argv, envp, mode);
> > > > > +      ret = ch_spawn_local.worker (path, argv, envp, mode);
> > > > >        break;
> > > > >      default:
> > > > >        set_errno (EINVAL);
> > > > 
> > > > The hang seems to be at acquiring the cygheap_protect lock in child sh.exe.
> > > > This lock is aquired only in _cfree() and _cmalloc(), so I am not sure why
> > > > cygheap_protect cannot be acquired at this point at all...
> > > 
> > > How do the affected backtraces look like?
> > 
> > Like this:
> > 
> > Thread 8 (Thread 19780.0x91a4):
> > #0  0x00007ff82ea91021 in ntdll!DbgBreakPoint () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #1  0x00007ff82eabca7e in ntdll!DbgUiRemoteBreakin () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #2  0x00007ff82cb97374 in KERNEL32!BaseThreadInitThunk () from /cygdrive/c/Windows/System32/KERNEL32.DLL
> > #3  0x00007ff82ea3cc91 in ntdll!RtlUserThreadStart () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #4  0x0000000000000000 in ?? ()
> > 
> > Thread 7 (Thread 19780.0xad74):
> > #0  0x00007ff82ea90f84 in ntdll!ZwWaitForWorkViaWorkerFactory () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #1  0x00007ff82ea3d407 in ntdll!TpReleaseCleanupGroupMembers () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #2  0x00007fff8a406773 in _cygtls::call2 (this=0x1a6ce00, func=0x7ff82ea3d110 <ntdll!TpReleaseCleanupGroupMembers+1104>, arg=0x780b50, buf=buf AT entry=0x1a6cd20) at ../../.././winsup/cygwin/cygtls.cc:41
> > #3  0x00007fff8a406835 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at ../../.././winsup/cygwin/cygtls.cc:28
> > #4  0x00007ff82cb97374 in KERNEL32!BaseThreadInitThunk () from /cygdrive/c/Windows/System32/KERNEL32.DLL
> > #5  0x00007ff82ea3cc91 in ntdll!RtlUserThreadStart () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #6  0x0000000000000000 in ?? ()
> > 
> > Thread 6 (Thread 19780.0x6fe8):
> > #0  0x00007ff82ea90f84 in ntdll!ZwWaitForWorkViaWorkerFactory () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #1  0x00007ff82ea3d407 in ntdll!TpReleaseCleanupGroupMembers () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #2  0x00007fff8a406773 in _cygtls::call2 (this=0x5fce00, func=0x7ff82ea3d110 <ntdll!TpReleaseCleanupGroupMembers+1104>, arg=0x780b50, buf=buf AT entry=0x5fcd20) at ../../.././winsup/cygwin/cygtls.cc:41
> > #3  0x00007fff8a406835 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at ../../.././winsup/cygwin/cygtls.cc:28
> > #4  0x00007ff82cb97374 in KERNEL32!BaseThreadInitThunk () from /cygdrive/c/Windows/System32/KERNEL32.DLL
> > #5  0x00007ff82ea3cc91 in ntdll!RtlUserThreadStart () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #6  0x0000000000000000 in ?? ()
> > 
> > Thread 5 (Thread 19780.0xd09c "sig"):
> > #0  0x00007ff82ea8d5b4 in ntdll!ZwReadFile () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #1  0x00007ff82c73dc13 in ReadFile () from /cygdrive/c/Windows/System32/KERNELBASE.dll
> > #2  0x00007fff8a4823a9 in wait_sig () at ../../.././winsup/cygwin/sigproc.cc:1487
> > #3  0x00007fff8a405640 in cygthread::callfunc (this=this AT entry=0x7fff8a608520 <threads>, issimplestub=issimplestub AT entry=false) at ../../.././winsup/cygwin/cygthread.cc:130
> > #4  0x00007fff8a405bba in cygthread::stub (arg=arg AT entry=0x7fff8a608520 <threads>) at ../../.././winsup/cygwin/cygthread.cc:173
> > #5  0x00007fff8a406773 in _cygtls::call2 (this=0x125ce00, func=0x7fff8a405b50 <cygthread::stub(void*)>, arg=0x7fff8a608520 <threads>, buf=buf AT entry=0x125cd20) at ../../.././winsup/cygwin/cygtls.cc:41
> > #6  0x00007fff8a406835 in _cygtls::call (func=<optimized out>, arg=<optimized out>) at ../../.././winsup/cygwin/cygtls.cc:28
> > #7  0x00007ff82cb97374 in KERNEL32!BaseThreadInitThunk () from /cygdrive/c/Windows/System32/KERNEL32.DLL
> > #8  0x00007ff82ea3cc91 in ntdll!RtlUserThreadStart () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #9  0x0000000000000000 in ?? ()
> > 
> > Thread 4 (Thread 19780.0x9bd8):
> > #0  0x00007ff82ea90f84 in ntdll!ZwWaitForWorkViaWorkerFactory () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #1  0x00007ff82ea3d407 in ntdll!TpReleaseCleanupGroupMembers () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #2  0x00007ff82cb97374 in KERNEL32!BaseThreadInitThunk () from /cygdrive/c/Windows/System32/KERNEL32.DLL
> > #3  0x00007ff82ea3cc91 in ntdll!RtlUserThreadStart () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #4  0x0000000000000000 in ?? ()
> > 
> > Thread 3 (Thread 19780.0xcbc4):
> > #0  0x00007ff82ea90f84 in ntdll!ZwWaitForWorkViaWorkerFactory () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #1  0x00007ff82ea3d407 in ntdll!TpReleaseCleanupGroupMembers () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #2  0x00007ff82cb97374 in KERNEL32!BaseThreadInitThunk () from /cygdrive/c/Windows/System32/KERNEL32.DLL
> > #3  0x00007ff82ea3cc91 in ntdll!RtlUserThreadStart () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #4  0x0000000000000000 in ?? ()
> > 
> > Thread 2 (Thread 19780.0x13298):
> > #0  0x00007ff82ea90f84 in ntdll!ZwWaitForWorkViaWorkerFactory () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #1  0x00007ff82ea3d407 in ntdll!TpReleaseCleanupGroupMembers () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #2  0x00007ff82cb97374 in KERNEL32!BaseThreadInitThunk () from /cygdrive/c/Windows/System32/KERNEL32.DLL
> > #3  0x00007ff82ea3cc91 in ntdll!RtlUserThreadStart () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #4  0x0000000000000000 in ?? ()
> > 
> > Thread 1 (Thread 19780.0x123a8 "sh"):
> > #0  0x00007ff82ea90f24 in ntdll!ZwWaitForAlertByThreadId () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #1  0x00007ff82ea19205 in ntdll!RtlAcquireSRWLockExclusive () from /cygdrive/c/Windows/SYSTEM32/ntdll.dll
> > #2  0x00007fff8a5372eb in _cfree (ptr=0x8000406e0) at ../../.././winsup/cygwin/mm/cygheap.cc:407
> > #3  cfree (s=0x8000406f0) at ../../.././winsup/cygwin/mm/cygheap.cc:514
> > #4  0x00007fff8a4510a9 in path_conv::~path_conv (this=0x7ffffc4f0, __in_chrg=<optimized out>) at ../../.././winsup/cygwin/path.cc:1395
> > #5  0x00007fff8a4950c8 in stat(const char * __restrict__, stat * __restrict__) (name=0xa0000c6e0 "/home/yano/20250611", buf=<optimized out>) at ../../.././winsup/cygwin/syscalls.cc:2135
> > #6  0x00007fff8a55a034 in _sigfe () at sigfe.s:35
> > #7  0x000000010040dfad in same_file ()
> > #8  0x0000000100420ae5 in set_pwd ()
> > #9  0x0000000100423917 in initialize_shell_variables ()
> > #10 0x00000001004019a9 in ?? ()
> > #11 0x000000010049526c in main ()
> > 
> > No other threads seem to grab the lock.
> > 
> > In the case above, cfree() is called from path_conv::~path_conv(), however,
> > in other cases cfree() is called from child_info_spawn::release().
> 
> I embedded debug code into mm/cygheap.cc, that is:
> 
> diff --git a/winsup/cygwin/mm/cygheap.cc b/winsup/cygwin/mm/cygheap.cc
> index 338886468..bab4067e0 100644
> --- a/winsup/cygwin/mm/cygheap.cc
> +++ b/winsup/cygwin/mm/cygheap.cc
> @@ -371,7 +371,16 @@ _cmalloc (unsigned size)
>    if (cygheap->buckets[b])
>      {
>        rvc = (_cmalloc_entry *) cygheap->buckets[b];
> -      cygheap->buckets[b] = rvc->ptr;
> +      __try
> +	{
> +	  cygheap->buckets[b] = rvc->ptr;
> +	}
> +      __except (NO_ERROR)
> +	{ /* Shouldl not reach */
> +	  system_printf("b = %d", b);
> +	  assert (false);
> +	}
> +      __endtry
>        rvc->b = b;
>      }
>    else
> diff --git a/winsup/cygwin/spawn.cc b/winsup/cygwin/spawn.cc
> index cb58b6eed..32f6bdead 100644
> --- a/winsup/cygwin/spawn.cc
> +++ b/winsup/cygwin/spawn.cc
> @@ -944,6 +944,7 @@ spawnve (int mode, const char *path, const char *const *argv,
>    int ret;
>  
>    syscall_printf ("spawnve (%s, %s, %p)", path, argv[0], envp);
> +  child_info_spawn ch_spawn_local;
>  
>    if (!envp)
>      envp = empty_env;
> @@ -961,7 +962,7 @@ spawnve (int mode, const char *path, const char *const *argv,
>      case _P_WAIT:
>      case _P_DETACH:
>      case _P_SYSTEM:
> -      ret = ch_spawn.worker (path, argv, envp, mode);
> +      ret = ch_spawn_local.worker (path, argv, envp, mode);
>        break;
>      default:
>        set_errno (EINVAL);
> 
> The result is like this! Why???
> 
>       0 [main] sh 617 _cmalloc: b = 1
> assertion "false" failed: file "../../.././winsup/cygwin/mm/cygheap.cc", line 381, function: void* _cmalloc(unsigned int)
> AAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAAA
> AAAAAAAAAAAAAAAAAAAA
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> (Hang)

Hopefully, I found the cause. In child_info_spawn::worker(),
refresh_cygheap() captures the current cygheap_max which will
be used by child_copy() in the child process. However, in
multi-thread case, cygheap might be modified before child_copy()
completed. As a result, cygheap can be broken in the child
process.

I submitted a series of patches for the series of the problems
regarding system() in multi-threads.

Please review.

-- 
Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019