delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2022/04/13/19:19:05

X-Recipient: archive-cygwin AT delorie DOT com
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org D6417385734A
Authentication-Results: sourceware.org;
dmarc=pass (p=none dis=none) header.from=ispras.ru
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=ispras.ru
MIME-Version: 1.0
Date: Thu, 14 Apr 2022 02:17:38 +0300
From: Alexey Izbyshev <izbyshev AT ispras DOT ru>
To: Takashi Yano <takashi DOT yano AT nifty DOT ne DOT jp>
Subject: Re: Deadlock of the process tree when running make
In-Reply-To: <1bdd5ac77277343fbff9b560fa98b15e@ispras.ru>
References: <9388316255ada0e0fcb2d849cce5a894 AT ispras DOT ru>
<20220409191743 DOT 6da2268a36e8c9b4ab22c722 AT nifty DOT ne DOT jp>
<1ecd670b1cdff43e0b0d7e5ee4c9cfc5 AT ispras DOT ru>
<ab3971adb8f441fd16bb62e480547a95 AT ispras DOT ru>
<20220409204619 DOT dd0e53902d5e108ef462e510 AT nifty DOT ne DOT jp>
<907ce1b4416a826cb07990dd601bd687 AT ispras DOT ru>
<20220410015753 DOT 753e2a238513eaf2a3da81e9 AT nifty DOT ne DOT jp>
<f55466cdda02fa46bc43174ba412df3a AT ispras DOT ru>
<20220410025410 DOT 196aa0a04368147dbbb31d3e AT nifty DOT ne DOT jp>
<afad32070411d6d94d5d94da90478af4 AT ispras DOT ru>
<7204ed0aa2d6b3fcfb239010e6b67646 AT ispras DOT ru>
<20220410163432 DOT 00dd7b9f81f8f322d97688f2 AT nifty DOT ne DOT jp>
<0e1a53626639cb21369225ff9092ecfc AT ispras DOT ru>
<b937a782f8b8993e3d4a058a354596a7 AT ispras DOT ru>
<20220411173526 DOT 6243b9492e0fc3d4132a58a8 AT nifty DOT ne DOT jp>
<ab8ded5fb5dad09dc2aebe5b49aa7dac AT ispras DOT ru>
<1bdd5ac77277343fbff9b560fa98b15e AT ispras DOT ru>
User-Agent: Roundcube Webmail/1.4.4
Message-ID: <f25d76d5897f60ab1a5a52bd0dffd484@ispras.ru>
X-Sender: izbyshev AT ispras DOT ru
X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_00, DOS_RCVD_IP_TWICE_B,
KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP,
T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
Cc: cygwin AT cygwin DOT com
Errors-To: cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com
Sender: "Cygwin" <cygwin-bounces+archive-cygwin=delorie DOT com AT cygwin DOT com>

On 2022-04-13 19:48, Alexey Izbyshev wrote:
> On 2022-04-11 13:10, Alexey Izbyshev wrote:
> What's probably not normal is the behavior of the hanging conhost.exe.
> I've compared the points where conhost.exe is blocked, and all but one
> threads in the model case are doing the same things as in the hanging
> case, but the remaining thread is blocked in
> ReadFile("\Device\NamedPipe\") (i.e. the read end of "hWritePipe" of
> pcon) instead of trying to enter a critical section like thread 1
> above. So now I'm starting to doubt that it's a cygwin bug and not
> some conhost.exe bug.
> 
> I'll try to poke around the hanging conhost.exe some more, and also
> may be will try to create a faster reproducer.
> 
I've studied conhost.exe hang, and it indeed looks like it's buggy.

TLDR: https://github.com/microsoft/terminal/pull/12181

The full story:

I dumped conhost.exe, opened the dump in windbg and looked at the stack 
trace of the hanging thread:

ntdll!NtWaitForAlertByThreadId+0x14
ntdll!RtlpWaitOnAddressWithTimeout+0x81
ntdll!RtlpWaitOnAddress+0xae
ntdll!RtlpWaitOnCriticalSection+0xfd
ntdll!RtlpEnterCriticalSectionContended+0x1c4
ntdll!RtlEnterCriticalSection+0x42
conhost!Microsoft::Console::Render::Renderer::_PaintFrameForEngine+0x54
conhost!Microsoft::Console::Render::Renderer::TriggerTeardown+0x19e60
conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit+0x21
conhost!Microsoft::Console::PtySignalInputThread::_GetData+0x65
conhost!Microsoft::Console::PtySignalInputThread::_InputThread+0x25
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21

By looking at assembly, I've found that it hangs *after* ReadFile() on 
the pipe completes, so the problem is definitely not a leak of 
hWritePipe in bash.exe or elsewhere.

Using the function names, I've found this issue: 
https://github.com/microsoft/terminal/issues/1810.

This is a different one, but the discussion and the patch shows that 
synchronization on startup/shutdown is a disaster.

Then I looked at the code and identified that hang happens while 
attempting to lock the console at [1]. After studying how this lock is 
used in other parts of the code, I noticed that 
PtySignalInputThread::_Shutdown() (which is further up in the call stack 
of the hanging function) uses ProcessCtrlEvents() incorrectly, because 
the latter unconditionally unlocks the console, but the lock is never 
taken by this thread at this point. Then I looked at a more recent 
version of the code and discovered the patch to _Shutdown() which I 
referenced above.

I've also verified that assembly of _Shutdown() (which is inlined into 
PtySignalInputThread::_GetData()) corresponds to the unpatched version 
(i.e. without LockConsole() call):

call    conhost!CloseConsoleProcessState (00007ff6`22e7013c)
call    conhost!ProcessCtrlEvents (00007ff6`22e262a0)
mov     ecx,6Dh
call    
conhost!Microsoft::Console::Interactivity::ServiceLocator::RundownAndExit 
(00007ff6`22e3c730)

I'm not sure why this bug is not triggered more frequently, but one 
possible reason, as indicated by comment [2], is that the bad path is 
only taken if there are live clients after ClosePseudoConsole() is 
called, which is probably rare.

A potential workaround on Cygwin side would be to ensure that the 
pseudoconsole doesn't have clients before calling ClosePseudoConsole(), 
but I don't know whether it's possible.

[1] 
https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/renderer/base/renderer.cpp#L75

[2] 
https://github.com/microsoft/terminal/blob/9b92986b49bed8cc41fde4d6ef080921c41e6d9e/src/host/PtySignalInputThread.cpp#L205

Alexey

-- 
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019