DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 4ANFrpQF853256 Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4ANFrpQF853256 Authentication-Results: delorie.com; dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=RCgurtew X-Recipient: archive-cygwin AT delorie DOT com DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 179F4385801B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; s=default; t=1732377230; bh=VL6Udq9cXx/UQk0ZrK98rDMifrOm27wHlU3xjcOKN44=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=RCgurtewlNkM+yXxjFMdD2PNAr1ay5YZCbBT/jN4HxHlqX324Szb/bbWrEEiUQp5T MjjHkhL/dMY0Fd8IWNFvDfiIxxSUaAok2vFKpWK5undSLP7vV0kLOULwsczoZZqZTR MHPOgo9pKydiz+F2ldNO8AhTJhW0MLqNYgNW6RcA= X-Original-To: cygwin AT cygwin DOT com Delivered-To: cygwin AT cygwin DOT com DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 751393858C42 ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 751393858C42 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732377204; cv=none; b=NnNiybncbbGtn6FEjiUSoj52W7tAwNCmNKlnRReG3m+BaTJbn21tO1QgvGAn/kX2g1jw/cOEPHGREMshcObrHhURa3EH3lxECvabLpYh6GugkOXzmEy3wR5CuCTg6XEEcAD9apvdYsG1Wxh49khCHbyETsWILy2pUrsPbKo675A= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732377204; c=relaxed/simple; bh=kfi8k4gE/GVwlEZK6ViYTGaeUAEkyNIF3kvy5IjDRIE=; h=Subject:To:From:Message-ID:Date:MIME-Version; b=HXeoZqWf8ECluTWtxumm8xjd/FlPfV5NS/glvswdU3/X8MwqQS0ntYz8miP613pPaxrzERzQMhbWpmxU4heb7KijJOLag8MlrruE3s+ZoO+XfGUkwkFQeeYB4JapUqdWWERj68ZWznybwjm7gw1CeoGAZlZqVuwXieyjQ10eYxg= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 751393858C42 Subject: Re: SIGKILL may no longer work after many SIGCONT/SIGSTOP signals To: cygwin AT cygwin DOT com References: <20241119182152 DOT c2195f50ed7091fbed644606 AT nifty DOT ne DOT jp> <20241120224308 DOT 000a18e48c0b8926e82e5147 AT nifty DOT ne DOT jp> <20241123205307 DOT 80e08e9669cd3e1ee72043a1 AT nifty DOT ne DOT jp> Message-ID: <7f00d1e4-736f-5f95-8bab-33a302487cdb@t-online.de> Date: Sat, 23 Nov 2024 16:53:21 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 SeaMonkey/2.53.19 MIME-Version: 1.0 In-Reply-To: <20241123205307.80e08e9669cd3e1ee72043a1@nifty.ne.jp> X-TOI-EXPURGATEID: 150726::1732377202-9CFE16C2-83F3E0A4/0/0 CLEAN NORMAL X-TOI-MSGID: 9a829fb5-5927-4ca3-9ae2-6867b0f46514 X-BeenThere: cygwin AT cygwin DOT com X-Mailman-Version: 2.1.30 Precedence: list List-Id: General Cygwin discussions and problem reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Christian Franke via Cygwin Reply-To: cygwin AT cygwin DOT com Cc: Christian Franke Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com Sender: "Cygwin" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id 4ANFrpQF853256 Takashi Yano via Cygwin wrote: > On Wed, 20 Nov 2024 22:43:08 +0900 > Takashi Yano wrote: >> On Tue, 19 Nov 2024 18:21:52 +0900 >> Takashi Yano wrote: >>> On Tue, 12 Nov 2024 10:53:58 +0100 >>> Christian Franke wrote: >>>> Found with 'stress-ng --cpu-sched' from current stress-ng upstream HEAD: >>>> >>>> Testcase (attached): >>>> >>>> $ gcc -O2 -o manysignals manysignals.c >>>> >>>> $ ./manysignals >>>> fork() = 1833 >>>> ... >>>> fork() = 1848 >>>> ... >>>> kill(1833, 17) >>>> ... >>>> kill(1848, 17) >>>> kill(1833, 9) >>>> ... >>>> kill(1848, 9) >>>> waitpid(1833, ., 0) >>>> >>>> >>>> Run this in second terminal: >>>> >>>> $ watch "ps | sed -n '1p;/manysignals/{/sed/d;p}'" >>>> >>>> If 'S' appear in the first column, the child processes likely reached >>>> the final SIGSTOP state. This takes some time. The parent process may >>>> still hang in first waitpid() but should not. >>>> >>>> If the parent process is aborted with ^C, child processes may be stopped >>>> or left behind. Occasionally a child process that can not be stopped by >>>> Cygwin (kill -9) is left behind. >>>> >>>> Tested with ancient (i7-2600K) and more recent (i7-14700K) CPU :-) >>>> >>>> >>>> Unrelated to the above, but related to 'stress-ng --cpu-sched' which >>>> uses sched_get/setscheduler(): >>>> >>>> - sched_getscheduler() always returns SCHED_FIFO. As far as I understand >>>> Linux sched(7), this is a non-preemptive real-time policy. The >>>> preemptive SCHED_RR would possibly a more reasonable value. >>>> Unfortunately SCHED_OTHER cannot be used because it would require to >>>> ignore the priority. >>>> >>>> - sched_setscheduler() always fails with ENOSYS. It IMO should allow to >>>> set 'param->sched_priority' if 'policy' is equal to the value returned >>>> by sched_getscheduler(). >>> Thanks for the report and the test case. I'm now looking into >>> the issue. Please wait a while. >> Hopefully, I have found the cause. >> >> The deadlock happens between main thread and wait_sig thread. >> The main thread is waiting for the wait_sig thread triggering >> wakeup event while the wait_sig thread is waiting previous >> signal being processed by main thread. >> >> Let me consider how to fix that. > I'd like to report my progress for this issue. > > The patch attached almost solves the problem. ... Compile error if applied to current git main (3dbc8c3):  ../../../../winsup/cygwin/exceptions.cc:1487:21: error: ‘struct _cygtls’ has no member named ‘sig’   1487 |   while (_main_tls->sig)        |                     ^~~ > However, your test > case is paused for tens of seconds, then ends normally. I guess this is as expected. The processing of the SIGSTOP/SIGCONT/.../SIGSTOP/SIGKILL sequence of each child process take some time because all are locked to a single core. > If the code: > cpu_set_t cpus; CPU_ZERO(&cpus); > CPU_SET(0, &cpus); > if (sched_setaffinity(getpid(), sizeof(cpus), &cpus)) > perror("setaffinity"); > > for (;;) > sched_yield(); > is changed to just: > for (;;) sleep(1); > the test case runs without pause. The pause will possibly reappear if the number of child processes is increased to some multiple of the available cores. > I think there still is a bug in the signal handling. Possibly related: https://sourceware.org/pipermail/cygwin/2024-November/256808.html -- Regards, Christian -- Problem reports: https://cygwin.com/problems.html FAQ: https://cygwin.com/faq/ Documentation: https://cygwin.com/docs.html Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple