delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2020/04/02/04:06:48

X-Recipient: archive-cygwin AT delorie DOT com
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EED58385E03C
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cygwin.com;
s=default; t=1585814756;
bh=t7kYYNRt/W+M/H9c71hiPuN++s5/8KgGYEF4XPXZ76g=;
h=To:References:In-Reply-To:Subject:Date:List-Id:List-Unsubscribe:
List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
From;
b=pVICXDxRfvUu1R+Zq7wINNXmRJR8GFE5cLIpVZv8QhgPtAn13zu29EXXuzk5qV9hT
6F2ymIEenqOeQ7ZIdPmblkOFtR3blAsAkKEcDy8ZCDnEXLCBGsqlT1Mdc5w9VnBzzs
Ir9oOhDmZw13obx2+Ih8PzreLcJGTuuPhHpb/d84=
X-Original-To: cygwin AT cygwin DOT com
Delivered-To: cygwin AT cygwin DOT com
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 5BB01385E02E
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:from:to:cc:references:in-reply-to:subject:date
:message-id:mime-version:content-transfer-encoding:content-language
:thread-index;
bh=4tIw/QaH/EahlGIGCnD1v+xhmSAfHN/PiqkZzbXFQUU=;
b=QKkRcN6N11U0OtrAaFq3Jyv9CNTTW+CPGm8KmJvgRNq1mzc3cG1NV2wTLiVqcb11Kv
K2cWq/9IAWU3V50uR0uM91nEd3SUSu3GE/WyJLEStax8lOpQfPizS4moT7P9BMwibYw4
VodzvmgOJwZq362aHy8LC9dCjvxd9/EziMJNk2AAqTHFZ7IPiHaX/I3N1O8KB1YXtVNK
OO07mZIzC0ZXyRPXM9fD6ptB/DcXZOZMthInsK6m+Ib0WiuvbC2de8SPKRRcX6dFvDul
n3OI/AOGZO5zGFknPEeZPoeHZNEfxmixfQ8gMHILhF1QHkyv6JqIVmqnMzaDW7SNls3C
fGPA==
X-Gm-Message-State: AGi0PuZKc+kpimghmUh7cZQUKNBzs27+/OqC3afQmZGbdq7CYOgXdjP0
RJTC3TVLbsK8VqetXtL7TWkYtxlN
X-Google-Smtp-Source: APiQypLJbk2RNwsjhPC/JrZkTJyCQ+pQF2D03QPKeHKk7uAjxdoUvKEcYukQJJ2HHR7QWEP6ZpL26w==
X-Received: by 2002:a2e:985a:: with SMTP id e26mr1278691ljj.17.1585814751722;
Thu, 02 Apr 2020 01:05:51 -0700 (PDT)
To: "'Ken Brown'" <kbrown AT cornell DOT edu>
References: <1b1401d60296$2769e690$763db3b0$@gmail.com>
<716e2076-f607-454e-2723-937c3959e2a3 AT cornell DOT edu>
<18be01d602ab$0bbfca30$233f5e90$@gmail.com>
<35b43b59-6410-f21f-710c-385e39cbae0b AT cornell DOT edu>
<005201d603ba$2bc8ab20$835a0160$@gmail.com>
<472d1df6-531a-ebd7-4ffa-583a06e270ff AT cornell DOT edu>
<ce4a4877-df77-3400-e8ac-16655b313757 AT cornell DOT edu>
<b311d907-7376-5bc6-3216-7d2b96728dbc AT cornell DOT edu>
<00b901d60447$7ecb4c50$7c61e4f0$@gmail.com>
<f8f5c9b4-9eb9-85b0-ab8f-44b3b2458e0b AT cornell DOT edu>
<00e001d604f9$d0aa0720$71fe1560$@gmail.com>
<8c6c5655-c162-8361-9f44-376bbd7cf114 AT cornell DOT edu>
<d213c473-00c2-0308-b720-d8f274126681 AT cornell DOT edu>
<3fe06192-7300-382a-8c98-f1bc2ff81e36 AT cornell DOT edu>
<003701d607a0$c975f140$5c61d3c0$@gmail.com>
<249be61e-da8a-7da1-ca67-0c4c6433a415 AT cornell DOT edu>
<000a01d60802$d1525900$73f70b00$@gmail.com>
<f7922d62-097c-e284-ded5-6d7a1c0f0d66 AT cornell DOT edu>
<001601d60848$fcffd320$f6ff7960$@gmail.com>
<7b5b058e-5047-4d49-8c31-5553056f3845 AT cornell DOT edu>
<7897bc10-439d-64aa-c173-f0bf4ec8246 8 AT cornell DOT edu>
In-Reply-To: <7897bc10-439d-64aa-c173-f0bf4ec82468@cornell.edu>
Subject: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Sv: Named pipes and multiple writers
Date: Thu, 2 Apr 2020 10:05:49 +0200
Message-ID: <000901d608c5$86361880$92a24980$@gmail.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQJE9fQg8TMZuJRGwTEKbo0ZGNgDeQHtMggPA01jw/MCH/KUAAKPevBvAf/qW+kC4eksHQHeBGSzAqmcYp4CB8F8lwIBR+2oARkkbuUCauCM3AJYP32sAmYz8EcC5hrYQwEcxqeZAcg8TosBauZ/cAGu+YSMAumkQH6mLDFkkA==
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00, DKIM_SIGNED,
DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,
SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
server2.sourceware.org
X-BeenThere: cygwin AT cygwin DOT com
X-Mailman-Version: 2.1.29
List-Id: General Cygwin discussions and problem reports <cygwin.cygwin.com>
List-Archive: <https://cygwin.com/pipermail/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-request AT cygwin DOT com?subject=help>
List-Subscribe: <http://cygwin.com/mailman/listinfo/cygwin>,
<mailto:cygwin-request AT cygwin DOT com?subject=subscribe>
From: Kristian Ivarsson via Cygwin <cygwin AT cygwin DOT com>
Reply-To: sten DOT kristian DOT ivarsson AT gmail DOT com
Cc: "'cygwin'" <cygwin AT cygwin DOT com>
Sender: "Cygwin" <cygwin-bounces AT cygwin DOT com>
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id 03286Nmq002486

> On 4/1/2020 2:34 PM, Ken Brown via Cygwin wrote:
> > On 4/1/2020 1:14 PM, sten DOT kristian DOT ivarsson AT gmail DOT com wrote:
> >>> On 4/1/2020 4:52 AM, sten DOT kristian DOT ivarsson AT gmail DOT com wrote:
> >>>>> On 3/31/2020 5:10 PM, sten DOT kristian DOT ivarsson AT gmail DOT com wrote:
> >>>>>>> On 3/28/2020 10:19 PM, Ken Brown via Cygwin wrote:
> >>>>>>>> On 3/28/2020 11:43 AM, Ken Brown via Cygwin wrote:
> >>>>>>>>> On 3/28/2020 8:10 AM, sten DOT kristian DOT ivarsson AT gmail DOT com wrote:
> >>>>>>>>>>> On 3/27/2020 10:53 AM, sten DOT kristian DOT ivarsson AT gmail DOT com wrote:
> >>>>>>>>>>>>> On 3/26/2020 7:19 PM, Ken Brown via Cygwin wrote:
> >>>>>>>>>>>>>> On 3/26/2020 6:39 PM, Ken Brown via Cygwin wrote:
> >>>>>>>>>>>>>>> On 3/26/2020 6:01 PM, sten DOT kristian DOT ivarsson AT gmail DOT com
wrote:
> >>>>>>>>>>>>>>>> The ENIXIO occurs when parallel child-processes
> >>>>>>>>>>>>>>>> simultaneously using O_NONBLOCK opening the descriptor.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> This is consistent with my guess that the error is
> >>>>>>>>>>>>>>> generated by fhandler_fifo::wait.  I have a feeling that
> >>>>>>>>>>>>>>> read_ready should have been created as a manual-reset
> >>>>>>>>>>>>>>> event, and that more care is needed to make sure it's
> >>>>>>>>>>>>>>> set
> >> when it should be.
> >>
> >> [snip]
> >>
> >>>>>>>> Never mind.  I was able to reproduce the problem and find the
cause.
> >>>>>>>> What happens is that when the first subprocess exits,
> >>>>>>>> fhandler_fifo::close resets read_ready.  That causes the second
> >>>>>>>> and subsequent subprocesses to think that there's no reader
> >>>>>>>> open, so their attempts to open a writer with O_NONBLOCK fail
with ENXIO.
> >>
> >> [snip]
> >>
> >>>> I wrote in a previous mail in this topic that it seemed to work
> >>>> fine for me as well, but when I bumped up the numbers of writers
> >>>> and/or the number of messages (e.g. 25/25) it starts to fail again
> >>
> >> [snip]
> >>
> >>> Yes, it is a resource issue.  There is a limit on the number of
> >>> writers
> >> that can be open at one
> >>> time, currently 64.  I chose that number arbitrarily, with no idea
> >>> what
> >> might actually be
> >>> needed in practice, and it can easily be changed.
> >>
> >> Does it have to be a limit at all ? We would rather see that the
> >> application decide how much resources it would like to use. In our
> >> particular case there will be a process-manager with an incoming pipe
> >> that possible several thousands of processes will write to
> >
> > I agree.
> >
> >> Just for fiddling around (to figure out if this is the limit that
> >> make other things work a bit odd), where's this 64 limit defined now ?
> >
> > It's MAX_CLIENTS, defined in fhandler.h.  But there seem to be other
> > resource issues also; simply increasing MAX_CLIENTS doesn't solve the
> > problem.  I think there are also problems with the number of threads,
> > for example.  Each time your program forks, the subprocess inherits
> > the rfd file descriptor and its "fifo_reader_thread" starts up.  This
> > is unnecessary for your application, so I tried disabling it (in
> fhandler_fifo::fixup_after_fork), just as an experiment.
> >
> > But then I ran into some deadlocks, suggesting that one of the locks
> > I'm using isn't robust enough.  So I've got a lot of things to work on.
> >
> >>> In addition, a writer isn't recognized as closed until a reader
> >>> tries to
> >> read and gets an error.
> >>> In your example with 25/25, the list of writers quickly gets to 64
> >>> before
> >> the parent ever tries
> >>> to read.
> >>
> >> That explains the behaviour, but should there be some error returned
> >> from open/write (maybe it is but I'm missing it) ?
> >
> > The error is discovered in add_client_handler, called from
> > thread_func.  I think you'll only see it if you run the program under
> > strace.  I'll see if I can find a way to report it.  Currently,
> > there's a retry loop in fhandler_fifo::open when a writer tries to
> > open, and I think I need to limit the number of retries and then error
out.
> 
> I pushed a few improvements and bug fixes, and your 25/25 example now runs
without a
> problem.  I increased MAX_CLIENTS to 1024 just for the sake of this
example, but I'll
> work on letting the number of writers increase dynamically as needed.

I pulled it and tried it out and yes, the sample test program with 25/25
worked well and a whole bunch of our unit-tests passed with ok result now

We still do have some issues, but I cannot yet tell if they are related to
named pipes or not

It is great that you're looking into a totally dynamic solution

Kristian

> Ken

--
Problem reports:      https://cygwin.com/problems.html
FAQ:                  https://cygwin.com/faq/
Documentation:        https://cygwin.com/docs.html
Unsubscribe info:     https://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019