delorie.com/archives/browse.cgi | search |
DMARC-Filter: | OpenDMARC Filter v1.4.2 delorie.com 4APCfbuo2526293 |
Authentication-Results: | delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com |
Authentication-Results: | delorie.com; spf=pass smtp.mailfrom=cygwin.com |
DKIM-Filter: | OpenDKIM Filter v2.11.0 delorie.com 4APCfbuo2526293 |
Authentication-Results: | delorie.com; |
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=y38e29WG | |
X-Recipient: | archive-cygwin AT delorie DOT com |
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 79BB43858D38 |
DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; |
s=default; t=1732538496; | |
bh=oTRzt/uRTqTobeb7uZmU8nFiPMPB4ER616D3SLZeFqY=; | |
h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe: | |
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: | |
From; | |
b=y38e29WGe0tcUx+ZiOdbMQss+/T2issJW5p8EAijA10VAaWC6/wNo8Wspw3kOzbGv | |
EXOTgapN7nV6Gi9HUOo83IwmVzd3Jm2iV5aU5hOGKn8pv/S2VAYS23SpkmnFAAvwFu | |
MKhQF9JodtpKxKvIivbCGAgRWhJ/CcWakX951lko= | |
X-Original-To: | cygwin AT cygwin DOT com |
Delivered-To: | cygwin AT cygwin DOT com |
DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org 00F8F3858D29 |
ARC-Filter: | OpenARC Filter v1.0.0 sourceware.org 00F8F3858D29 |
ARC-Seal: | i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732538424; cv=none; |
b=kQR3cgR7GQUnW0doDtfR9FcRCYugpF6xf3/j0cZYXzeVQZlTvSUJhiFkLDEYlBMHGUMTpPTz2DtLY0mUNA2hRahyV97y8UYBSeAKQ36DsoWrOMkzYXY7viZ5w0HT9JR6ey3lSfED9/J5tk0gLsQZUJjw6CbKcOfE92mYytyZznU= | |
ARC-Message-Signature: | i=1; a=rsa-sha256; d=sourceware.org; s=key; |
t=1732538424; c=relaxed/simple; | |
bh=MOKdXNIhtPqFG2RElWbxRZOg0PoL1BXAkXaxNyBG/Ts=; | |
h=Date:From:To:Subject:Message-Id:Mime-Version:DKIM-Signature; | |
b=Wl/K8J/ywiCZE0FvvEoabpJ2FbbUnzZRPcxsHv7ck5cFmkNwHBnkPAVmrdC6SV8IBZH1i+ch+5zuEV3YlMjIbduju6eZAaG+louclyxr8DRo82AVAsxQTKhBuaTUkUvPoKkdcNZsBK9iLewkPp+jsxtB99l/uCy6jceFAPmKk3g= | |
ARC-Authentication-Results: | i=1; server2.sourceware.org |
DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 00F8F3858D29 |
Date: | Mon, 25 Nov 2024 21:40:21 +0900 |
To: | cygwin AT cygwin DOT com |
Subject: | Re: SIGKILL may no longer work after many SIGCONT/SIGSTOP signals |
Message-Id: | <20241125214021.39eeb134d99c6226678d1e17@nifty.ne.jp> |
In-Reply-To: | <20241125212345.4effa99060e84754658a49f4@nifty.ne.jp> |
References: | <adc78776-84f6-82bc-13b4-3a51b11027fa AT t-online DOT de> |
<20241119182152 DOT c2195f50ed7091fbed644606 AT nifty DOT ne DOT jp> | |
<20241120224308 DOT 000a18e48c0b8926e82e5147 AT nifty DOT ne DOT jp> | |
<20241123205307 DOT 80e08e9669cd3e1ee72043a1 AT nifty DOT ne DOT jp> | |
<7f00d1e4-736f-5f95-8bab-33a302487cdb AT t-online DOT de> | |
<20241124011509 DOT e30f0a5fa2ef86b240f260bf AT nifty DOT ne DOT jp> | |
<20241125212345 DOT 4effa99060e84754658a49f4 AT nifty DOT ne DOT jp> | |
X-Mailer: | Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) |
Mime-Version: | 1.0 |
X-BeenThere: | cygwin AT cygwin DOT com |
X-Mailman-Version: | 2.1.30 |
List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
From: | Takashi Yano via Cygwin <cygwin AT cygwin DOT com> |
Reply-To: | Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp> |
Errors-To: | cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com |
Sender: | "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com> |
X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 4APCfbuo2526293 |
On Mon, 25 Nov 2024 21:23:45 +0900 Takashi Yano wrote: > On Sun, 24 Nov 2024 01:15:09 +0900 > Takashi Yano wrote: > > On Sat, 23 Nov 2024 16:53:21 +0100 > > Christian Franke wrote: > > > Takashi Yano via Cygwin wrote: > > > > On Wed, 20 Nov 2024 22:43:08 +0900 > > > > Takashi Yano wrote: > > > >> On Tue, 19 Nov 2024 18:21:52 +0900 > > > >> Takashi Yano wrote: > > > >>> On Tue, 12 Nov 2024 10:53:58 +0100 > > > >>> Christian Franke wrote: > > > >>>> Found with 'stress-ng --cpu-sched' from current stress-ng upstream HEAD: > > > >>>> > > > >>>> Testcase (attached): > > > >>>> > > > >>>> $ gcc -O2 -o manysignals manysignals.c > > > >>>> > > > >>>> $ ./manysignals > > > >>>> fork() = 1833 > > > >>>> ... > > > >>>> fork() = 1848 > > > >>>> ... > > > >>>> kill(1833, 17) > > > >>>> ... > > > >>>> kill(1848, 17) > > > >>>> kill(1833, 9) > > > >>>> ... > > > >>>> kill(1848, 9) > > > >>>> waitpid(1833, ., 0) > > > >>>> > > > >>>> > > > >>>> Run this in second terminal: > > > >>>> > > > >>>> $ watch "ps | sed -n '1p;/manysignals/{/sed/d;p}'" > > > >>>> > > > >>>> If 'S' appear in the first column, the child processes likely reached > > > >>>> the final SIGSTOP state. This takes some time. The parent process may > > > >>>> still hang in first waitpid() but should not. > > > >>>> > > > >>>> If the parent process is aborted with ^C, child processes may be stopped > > > >>>> or left behind. Occasionally a child process that can not be stopped by > > > >>>> Cygwin (kill -9) is left behind. > > > >>>> > > > >>>> Tested with ancient (i7-2600K) and more recent (i7-14700K) CPU :-) > > > >>>> > > > >>>> > > > >>>> Unrelated to the above, but related to 'stress-ng --cpu-sched' which > > > >>>> uses sched_get/setscheduler(): > > > >>>> > > > >>>> - sched_getscheduler() always returns SCHED_FIFO. As far as I understand > > > >>>> Linux sched(7), this is a non-preemptive real-time policy. The > > > >>>> preemptive SCHED_RR would possibly a more reasonable value. > > > >>>> Unfortunately SCHED_OTHER cannot be used because it would require to > > > >>>> ignore the priority. > > > >>>> > > > >>>> - sched_setscheduler() always fails with ENOSYS. It IMO should allow to > > > >>>> set 'param->sched_priority' if 'policy' is equal to the value returned > > > >>>> by sched_getscheduler(). > > > >>> Thanks for the report and the test case. I'm now looking into > > > >>> the issue. Please wait a while. > > > >> Hopefully, I have found the cause. > > > >> > > > >> The deadlock happens between main thread and wait_sig thread. > > > >> The main thread is waiting for the wait_sig thread triggering > > > >> wakeup event while the wait_sig thread is waiting previous > > > >> signal being processed by main thread. > > > >> > > > >> Let me consider how to fix that. > > > > I'd like to report my progress for this issue. > > > > > > > > The patch attached almost solves the problem. ... > > > > > > Compile error if applied to current git main (3dbc8c3): > > > > > >  ../../../../winsup/cygwin/exceptions.cc:1487:21: error: ‘struct > > > _cygtls’ has no member named ‘sig’ > > >  1487 |  while (_main_tls->sig) > > >     |                    ^~~ > > > > This is because the latest Corinna's commit changes the name 'sig' > > to 'current_sig'. > > > > commit 3dbc8c3fbdc99d3f0f68fab8ba2a814ecdc27e17 > > Cygwin: cygtls: rename sig to current_sig > > > > > > However, your test > > > > case is paused for tens of seconds, then ends normally. > > > > > > I guess this is as expected. The processing of the > > > SIGSTOP/SIGCONT/.../SIGSTOP/SIGKILL sequence of each child process take > > > some time because all are locked to a single core. > > > > I feel it's too slow even if 16 processes (with wait_sig threads) are > > executed in one CPU core. > > > > > > If the code: > > > > cpu_set_t cpus; CPU_ZERO(&cpus); > > > > CPU_SET(0, &cpus); > > > > if (sched_setaffinity(getpid(), sizeof(cpus), &cpus)) > > > > perror("setaffinity"); > > > > > > > > for (;;) > > > > sched_yield(); > > > > is changed to just: > > > > for (;;) sleep(1); > > > > the test case runs without pause. > > > > > > The pause will possibly reappear if the number of child processes is > > > increased to some multiple of the available cores. > > > > I tested with np = 16*32 without sched_setaffinity() call, the pause > > does not happen. My CPU is Threadripper 1950X 16-core 32-thread. > > > > > > I think there still is a bug in the signal handling. > > I have just submitted 6 patches for this issue. With these pathces, > the problem reported no longer occurs in my environment. As the patches show, this test case triggers several issues in cygwin that are combined with each other. With struggling so much, I think I could resolve the issues finally. The patch turned out to be a simple ones considering how long it took. -- Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp> -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |