delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2024/11/23/11:15:42

DMARC-Filter: OpenDMARC Filter v1.4.2 delorie.com 4ANGFgQh859621
Authentication-Results: delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com
Authentication-Results: delorie.com; spf=pass smtp.mailfrom=cygwin.com
DKIM-Filter: OpenDKIM Filter v2.11.0 delorie.com 4ANGFgQh859621
Authentication-Results: delorie.com;
dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=i3iFcwRS
X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3C39E3858C42
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1732378541;
bh=13kfNm7lty3V+5nG/acmmY4Ut6IlL9ECYY8VkY6Pz+g=;
h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
From;
b=i3iFcwRSAyf4kZ8x7xVWocyvjAYB0XrUFJ1faU3kV3/WGqnTEFLuquVERWh01i1rx
XqvhGPkriFdktwRvfq0AujPaKFH31gDQ87GbZaScpQ7u54eBCiA7Rra9BEALrAE+NC
we/iqE86yB8mDOC5/sHfdblnYkhzCUe63INoxROA=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 12042385829B
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 12042385829B
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732378512; cv=none;
b=GOCKdJJL/MyvwVfyv5KsAx0tcDC+bMTtXKr35NOZLy+K/62pgMqKM1N9/i/KA5cYzbMgOzP4jvS4vjdg8Gb0IDxOGBHpKN/AeeUAIVUat7Lnjq/cywlW0QBghpMN9/G2QKQoHhwBObc7O2OGJMVsGRyG9ULrIKpY7Qn8Qos5Sac=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
t=1732378512; c=relaxed/simple;
bh=9o+Yn4e2B2FCqQgg2XtBI5yFBMc9Xn8+MjIJ7GYDM1g=;
h=Date:From:To:Subject:Message-Id:Mime-Version:DKIM-Signature;
b=rja9BYoGXS0RBa7gzjOuomQy5UeoSf1ksBWjBO4AGNUUoD6qYW7zmsOJH2U2zOINW3oJLnlQJosuKt82p1lHTp9tvbA/8U0O8k3bKU6Smu/k7jQqcMye6/+6/pGaJSfj7uzEkZjIHTwizLQWtdJWurAGuuLMnaHqrdSq2QBOuMs=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 12042385829B
Date: Sun, 24 Nov 2024 01:15:09 +0900
To: cygwin AT cygwin DOT com
Subject: Re: SIGKILL may no longer work after many SIGCONT/SIGSTOP signals
Message-Id: <20241124011509.e30f0a5fa2ef86b240f260bf@nifty.ne.jp>
In-Reply-To: <7f00d1e4-736f-5f95-8bab-33a302487cdb@t-online.de>
References: <adc78776-84f6-82bc-13b4-3a51b11027fa AT t-online DOT de>
<20241119182152 DOT c2195f50ed7091fbed644606 AT nifty DOT ne DOT jp>
<20241120224308 DOT 000a18e48c0b8926e82e5147 AT nifty DOT ne DOT jp>
<20241123205307 DOT 80e08e9669cd3e1ee72043a1 AT nifty DOT ne DOT jp>
<7f00d1e4-736f-5f95-8bab-33a302487cdb AT t-online DOT de>
X-Mailer: Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
Mime-Version: 1.0
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.30
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Takashi Yano via Cygwin <cygwin AT cygwin DOT com>
Reply-To: Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>
Errors-To: cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com>
X-MIME-Autoconverted: from base64 to 8bit by delorie.com id 4ANGFgQh859621

On Sat, 23 Nov 2024 16:53:21 +0100
Christian Franke wrote:
> Takashi Yano via Cygwin wrote:
> > On Wed, 20 Nov 2024 22:43:08 +0900
> > Takashi Yano wrote:
> >> On Tue, 19 Nov 2024 18:21:52 +0900
> >> Takashi Yano wrote:
> >>> On Tue, 12 Nov 2024 10:53:58 +0100
> >>> Christian Franke wrote:
> >>>> Found with 'stress-ng --cpu-sched' from current stress-ng upstream HEAD:
> >>>>
> >>>> Testcase (attached):
> >>>>
> >>>> $ gcc -O2 -o manysignals manysignals.c
> >>>>
> >>>> $ ./manysignals
> >>>> fork() = 1833
> >>>> ...
> >>>> fork() = 1848
> >>>> ...
> >>>> kill(1833, 17)
> >>>> ...
> >>>> kill(1848, 17)
> >>>> kill(1833, 9)
> >>>> ...
> >>>> kill(1848, 9)
> >>>> waitpid(1833, ., 0)
> >>>>
> >>>>
> >>>> Run this in second terminal:
> >>>>
> >>>> $ watch "ps | sed -n '1p;/manysignals/{/sed/d;p}'"
> >>>>
> >>>> If 'S' appear in the first column, the child processes likely reached
> >>>> the final SIGSTOP state. This takes some time. The parent process may
> >>>> still hang in first waitpid() but should not.
> >>>>
> >>>> If the parent process is aborted with ^C, child processes may be stopped
> >>>> or left behind. Occasionally a child process that can not be stopped by
> >>>> Cygwin (kill -9) is left behind.
> >>>>
> >>>> Tested with ancient (i7-2600K) and more recent (i7-14700K) CPU :-)
> >>>>
> >>>>
> >>>> Unrelated to the above, but related to 'stress-ng --cpu-sched' which
> >>>> uses sched_get/setscheduler():
> >>>>
> >>>> - sched_getscheduler() always returns SCHED_FIFO. As far as I understand
> >>>> Linux sched(7), this is a non-preemptive real-time policy. The
> >>>> preemptive SCHED_RR would possibly a more reasonable value.
> >>>> Unfortunately SCHED_OTHER cannot be used because it would require to
> >>>> ignore the priority.
> >>>>
> >>>> - sched_setscheduler() always fails with ENOSYS. It IMO should allow to
> >>>> set 'param->sched_priority' if 'policy' is equal to the value returned
> >>>> by sched_getscheduler().
> >>> Thanks for the report and the test case. I'm now looking into
> >>> the issue. Please wait a while.
> >> Hopefully, I have found the cause.
> >>
> >> The deadlock happens between main thread and wait_sig thread.
> >> The main thread is waiting for the wait_sig thread triggering
> >> wakeup event while the wait_sig thread is waiting previous
> >> signal being processed by main thread.
> >>
> >> Let me consider how to fix that.
> > I'd like to report my progress for this issue.
> >
> > The patch attached almost solves the problem. ...
> 
> Compile error if applied to current git main (3dbc8c3):
> 
>   ../../../../winsup/cygwin/exceptions.cc:1487:21: error: ‘struct 
> _cygtls’ has no member named ‘sig’
>    1487 |   while (_main_tls->sig)
>         |                     ^~~

This is because the latest Corinna's commit changes the name 'sig'
to 'current_sig'.

commit	3dbc8c3fbdc99d3f0f68fab8ba2a814ecdc27e17
Cygwin: cygtls: rename sig to current_sig

> >   However, your test
> > case is paused for tens of seconds, then ends normally.
> 
> I guess this is as expected. The processing of the 
> SIGSTOP/SIGCONT/.../SIGSTOP/SIGKILL sequence of each child process take 
> some time because all are locked to a single core.

I feel it's too slow even if 16 processes (with wait_sig threads) are
executed in one CPU core.

> > If the code:
> >        cpu_set_t cpus; CPU_ZERO(&cpus);
> >        CPU_SET(0, &cpus);
> >        if (sched_setaffinity(getpid(), sizeof(cpus), &cpus))
> >          perror("setaffinity");
> >
> >        for (;;)
> >          sched_yield();
> > is changed to just:
> >        for (;;) sleep(1);
> > the test case runs without pause.
> 
> The pause will possibly reappear if the number of child processes is 
> increased to some multiple of the available cores.

I tested with np = 16*32 without sched_setaffinity() call, the pause
does not happen. My CPU is Threadripper 1950X 16-core 32-thread.

> > I think there still is a bug in the signal handling.
> 
> Possibly related:
> https://sourceware.org/pipermail/cygwin/2024-November/256808.html

I also looked into this issue a bit, but I think this is another issue.

-- 
Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019