| delorie.com/archives/browse.cgi | search |
| DMARC-Filter: | OpenDMARC Filter v1.4.2 delorie.com 4APCPOT52504243 |
| Authentication-Results: | delorie.com; dmarc=pass (p=none dis=none) header.from=cygwin.com |
| Authentication-Results: | delorie.com; spf=pass smtp.mailfrom=cygwin.com |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 delorie.com 4APCPOT52504243 |
| Authentication-Results: | delorie.com; |
| dkim=pass (1024-bit key, unprotected) header.d=cygwin.com header.i=@cygwin.com header.a=rsa-sha256 header.s=default header.b=ZdZuoYgw | |
| X-Recipient: | archive-cygwin AT delorie DOT com |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 7C8583858405 |
| DKIM-Signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com; |
| s=default; t=1732537522; | |
| bh=52zYxdrIItXP9YHWqTANtkWy1dQ0xqmCCMX3yBum1mk=; | |
| h=Date:To:Subject:In-Reply-To:References:List-Id:List-Unsubscribe: | |
| List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: | |
| From; | |
| b=ZdZuoYgwxZwON+17WZ8Fi21VoLzxkWQoIx+oehIYxEC1xb5yzNNGQaWS8A1BbG6J5 | |
| zLpqhS856wcZDIAkzl86I8hx81L5BqHJy/jKDoULDuUqZpjSI3+tuuRW/jfnkb0Anq | |
| yLgbmAR3+AnqgWPZTo/82I+qjIM1xZiGuxtfLQfY= | |
| X-Original-To: | cygwin AT cygwin DOT com |
| Delivered-To: | cygwin AT cygwin DOT com |
| DMARC-Filter: | OpenDMARC Filter v1.4.2 sourceware.org 3CBCD3858D37 |
| ARC-Filter: | OpenARC Filter v1.0.0 sourceware.org 3CBCD3858D37 |
| ARC-Seal: | i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1732537428; cv=none; |
| b=rnFAdmictAWmK5KW7afT2LezvEmQgS3dBxn0iyCabm1MXpWmigZTgowdLvwROHl7vSUMcKEY+lMb79DXv9Cgu/hjg6X7BvR7AkaY1Mhnlqd224rc+XeZ7Nmhk/9A+m/6yCIi6YAj2NxB6/qzATgf0DAlQ1MevKSABzkia/WVxxU= | |
| ARC-Message-Signature: | i=1; a=rsa-sha256; d=sourceware.org; s=key; |
| t=1732537428; c=relaxed/simple; | |
| bh=+g4zWA5/acmd3zjrsEJblUvX9JEFplETQK62UoabX2w=; | |
| h=Date:From:To:Subject:Message-Id:Mime-Version:DKIM-Signature; | |
| b=NSR3+W+wf0OSQgfPpYj43Lb05qazLBWQ94qS+8Tf8LFho7zGTSTAYVw2F02+O+hIuGANT3RJtS861igykrXIPlsEnaZ36F1qjlKaaGGMwX80ySo+z4UyiJr4PW7RwPfRDS7ahMtYGHGzE4WsTdiep1N1kOCLUQeuhAx10YcoF2k= | |
| ARC-Authentication-Results: | i=1; server2.sourceware.org |
| DKIM-Filter: | OpenDKIM Filter v2.11.0 sourceware.org 3CBCD3858D37 |
| Date: | Mon, 25 Nov 2024 21:23:45 +0900 |
| To: | cygwin AT cygwin DOT com |
| Subject: | Re: SIGKILL may no longer work after many SIGCONT/SIGSTOP signals |
| Message-Id: | <20241125212345.4effa99060e84754658a49f4@nifty.ne.jp> |
| In-Reply-To: | <20241124011509.e30f0a5fa2ef86b240f260bf@nifty.ne.jp> |
| References: | <adc78776-84f6-82bc-13b4-3a51b11027fa AT t-online DOT de> |
| <20241119182152 DOT c2195f50ed7091fbed644606 AT nifty DOT ne DOT jp> | |
| <20241120224308 DOT 000a18e48c0b8926e82e5147 AT nifty DOT ne DOT jp> | |
| <20241123205307 DOT 80e08e9669cd3e1ee72043a1 AT nifty DOT ne DOT jp> | |
| <7f00d1e4-736f-5f95-8bab-33a302487cdb AT t-online DOT de> | |
| <20241124011509 DOT e30f0a5fa2ef86b240f260bf AT nifty DOT ne DOT jp> | |
| X-Mailer: | Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32) |
| Mime-Version: | 1.0 |
| X-BeenThere: | cygwin AT cygwin DOT com |
| X-Mailman-Version: | 2.1.30 |
| List-Id: | General Cygwin discussions and problem reports <cygwin.cygwin.com> |
| List-Unsubscribe: | <https://cygwin.com/mailman/options/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe> | |
| List-Archive: | <https://cygwin.com/pipermail/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-request AT cygwin DOT com?subject=help> |
| List-Subscribe: | <https://cygwin.com/mailman/listinfo/cygwin>, |
| <mailto:cygwin-request AT cygwin DOT com?subject=subscribe> | |
| From: | Takashi Yano via Cygwin <cygwin AT cygwin DOT com> |
| Reply-To: | Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp> |
| Errors-To: | cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com |
| Sender: | "Cygwin" <cygwin-bounces~archive-cygwin=delorie DOT com AT cygwin DOT com> |
| X-MIME-Autoconverted: | from base64 to 8bit by delorie.com id 4APCPOT52504243 |
On Sun, 24 Nov 2024 01:15:09 +0900
Takashi Yano wrote:
> On Sat, 23 Nov 2024 16:53:21 +0100
> Christian Franke wrote:
> > Takashi Yano via Cygwin wrote:
> > > On Wed, 20 Nov 2024 22:43:08 +0900
> > > Takashi Yano wrote:
> > >> On Tue, 19 Nov 2024 18:21:52 +0900
> > >> Takashi Yano wrote:
> > >>> On Tue, 12 Nov 2024 10:53:58 +0100
> > >>> Christian Franke wrote:
> > >>>> Found with 'stress-ng --cpu-sched' from current stress-ng upstream HEAD:
> > >>>>
> > >>>> Testcase (attached):
> > >>>>
> > >>>> $ gcc -O2 -o manysignals manysignals.c
> > >>>>
> > >>>> $ ./manysignals
> > >>>> fork() = 1833
> > >>>> ...
> > >>>> fork() = 1848
> > >>>> ...
> > >>>> kill(1833, 17)
> > >>>> ...
> > >>>> kill(1848, 17)
> > >>>> kill(1833, 9)
> > >>>> ...
> > >>>> kill(1848, 9)
> > >>>> waitpid(1833, ., 0)
> > >>>>
> > >>>>
> > >>>> Run this in second terminal:
> > >>>>
> > >>>> $ watch "ps | sed -n '1p;/manysignals/{/sed/d;p}'"
> > >>>>
> > >>>> If 'S' appear in the first column, the child processes likely reached
> > >>>> the final SIGSTOP state. This takes some time. The parent process may
> > >>>> still hang in first waitpid() but should not.
> > >>>>
> > >>>> If the parent process is aborted with ^C, child processes may be stopped
> > >>>> or left behind. Occasionally a child process that can not be stopped by
> > >>>> Cygwin (kill -9) is left behind.
> > >>>>
> > >>>> Tested with ancient (i7-2600K) and more recent (i7-14700K) CPU :-)
> > >>>>
> > >>>>
> > >>>> Unrelated to the above, but related to 'stress-ng --cpu-sched' which
> > >>>> uses sched_get/setscheduler():
> > >>>>
> > >>>> - sched_getscheduler() always returns SCHED_FIFO. As far as I understand
> > >>>> Linux sched(7), this is a non-preemptive real-time policy. The
> > >>>> preemptive SCHED_RR would possibly a more reasonable value.
> > >>>> Unfortunately SCHED_OTHER cannot be used because it would require to
> > >>>> ignore the priority.
> > >>>>
> > >>>> - sched_setscheduler() always fails with ENOSYS. It IMO should allow to
> > >>>> set 'param->sched_priority' if 'policy' is equal to the value returned
> > >>>> by sched_getscheduler().
> > >>> Thanks for the report and the test case. I'm now looking into
> > >>> the issue. Please wait a while.
> > >> Hopefully, I have found the cause.
> > >>
> > >> The deadlock happens between main thread and wait_sig thread.
> > >> The main thread is waiting for the wait_sig thread triggering
> > >> wakeup event while the wait_sig thread is waiting previous
> > >> signal being processed by main thread.
> > >>
> > >> Let me consider how to fix that.
> > > I'd like to report my progress for this issue.
> > >
> > > The patch attached almost solves the problem. ...
> >
> > Compile error if applied to current git main (3dbc8c3):
> >
> >  ../../../../winsup/cygwin/exceptions.cc:1487:21: error: ‘struct
> > _cygtls’ has no member named ‘sig’
> > Â 1487 |Â Â while (_main_tls->sig)
> > Â Â Â Â |Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ^~~
>
> This is because the latest Corinna's commit changes the name 'sig'
> to 'current_sig'.
>
> commit 3dbc8c3fbdc99d3f0f68fab8ba2a814ecdc27e17
> Cygwin: cygtls: rename sig to current_sig
>
> > > However, your test
> > > case is paused for tens of seconds, then ends normally.
> >
> > I guess this is as expected. The processing of the
> > SIGSTOP/SIGCONT/.../SIGSTOP/SIGKILL sequence of each child process take
> > some time because all are locked to a single core.
>
> I feel it's too slow even if 16 processes (with wait_sig threads) are
> executed in one CPU core.
>
> > > If the code:
> > > cpu_set_t cpus; CPU_ZERO(&cpus);
> > > CPU_SET(0, &cpus);
> > > if (sched_setaffinity(getpid(), sizeof(cpus), &cpus))
> > > perror("setaffinity");
> > >
> > > for (;;)
> > > sched_yield();
> > > is changed to just:
> > > for (;;) sleep(1);
> > > the test case runs without pause.
> >
> > The pause will possibly reappear if the number of child processes is
> > increased to some multiple of the available cores.
>
> I tested with np = 16*32 without sched_setaffinity() call, the pause
> does not happen. My CPU is Threadripper 1950X 16-core 32-thread.
>
> > > I think there still is a bug in the signal handling.
I have just submitted 6 patches for this issue. With these pathces,
the problem reported no longer occurs in my environment.
--
Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>
--
Problem reports: https://cygwin.com/problems.html
FAQ: https://cygwin.com/faq/
Documentation: https://cygwin.com/docs.html
Unsubscribe info: https://cygwin.com/ml/#unsubscribe-simple
| webmaster | delorie software privacy |
| Copyright © 2019 by DJ Delorie | Updated Jul 2019 |