Message-ID: <44A507F8.4030409@netbauds.net>
Date: 	Fri, 30 Jun 2006 12:16:08 +0100
From: Darryl Miles <darryl-mailinglists AT netbauds DOT net>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.0.4) Gecko/20060614 SeaMonkey/1.0.2
MIME-Version: 1.0
To: cygwin AT cygwin DOT com
Subject: Re: rsync over ssh hang issue understood
References: <44A348D1 DOT 6070908 AT netbauds DOT net> <ba40711f0606291839p2e1d7b10l7befd8cf2cc2d1a7 AT mail DOT gmail DOT com>
In-Reply-To: <ba40711f0606291839p2e1d7b10l7befd8cf2cc2d1a7@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com

Lev Bishop wrote:
> On 6/28/06, Darryl Miles wrote:
> See how-to-debug-cygwin.txt
> http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/cygwin/how-to-debug-cygwin.txt?rev=1.12&content-type=text/x-cvsweb-markup&cvsroot=src 

Thanks for your pointers.  Everything I'm wanting to get started is 
already covered in the how-to-debug-cygwin.txt.


>> indications from select(2) interface. But if no worker thread is busy
>> working on that fd then you get writability back ?
> 
> Yes, but it is very hard to get the precise unix semantics. For
> example, the application issues a write() which spawns off a thread
> that then blocks. Then the application exit()s, causing the thread to
> also terminate before completing its write, and the write never
> completes.

This is a very valid point, but not one that is a problem in the 
situations I'm looking at.  The situation I am looking as it much more 
chronic.

How does Overlapping I/O get around this, since you have send the data 
into the kernel layer and are now waiting on a completion notification 
or event signalling.  If the application holding the handle exits from 
under it, does Win32 kernel abort the I/O in this circumstance ?

What about if this was gotten around via a fork() but not at every I/O 
but only if we exit and there is an incomplete I/O operation still in 
progress.   Can we:

  * fork()
  * reaquire handle, as per dup()
  * CloseHandle() from dying process
  * receive IO completion callback with indication of failure, handle 
was closed!
  * hand data over to the child (of fork()) for it to take up the mission.

Maybe there is a resident part of cygwin that could take up the mission, 
since a named pipe can be obtained by any process on the system.  This 
resident part is a process outside of the lifecycle of the emulated 
POSIX processes.

It still would not be perfect but I can't think of any situation that 
would use a single write call (as two writes would be allowed to cause 
blocking) and the data must reliability make it to the reader, but once 
written the writer exited.  Pretty rare if you ask me.  Even when it was 
queued into a POSIX kernel there is no guarantee the reader will read 
it, it might sit in the buffer.  Applications that need that guarantee 
would round trip the other end of the pipe to be sure.

At least we should be able to _DETECT_ that incomplete pipe writing I/O 
is still in progress when a process exits.  So maybe we can log a 
warning and pickup any real problem from there.  Rather than thinking 
too deeply about that rare case.


> There is also the issue of what return value to give the application
> doing the write() on the pipe. You'll have to be careful to deal with
> error conditions, SIGPIPE, etc, etc.

As cgf put:
| If I understand the plan correctly, in the scenario where select says
| it's ok to write but it really isn't, the write would return as if it
| succeeded and a writer thread would be created which sits around
| trying to empty the pipe.

This is _EXACTLY_ the problem as I see.  We have to deal with those 
rules, if the OS can't tell us in a reliable way that a write() will work.

The writer thread sits around trying to fill the pipe, would be more 
correct.


There maybe other ways to deal with that write() but as far as I 
understand the NT kernel does not provide a true non-blocking mechanism 
to work from with pipes.  This is where you can offer to the kernel the 
data and if the buffers are full the kernel will reject the data without 
blocking leaving the application holding it.  Overlapped I/O as I 
understand it does not work like this.

I have read the Overlapped I/O model as documented, but in my (limited) 
understanding of Overlapped I/O is that the call to 
WriteFile()/WriteFileEx() can still block (and it probably will under 
the pipelined conditions of rsync+ssh) when the kernel can't queue new 
requests.

I have not read this anywhere but surely everyone can appreciate that an 
application can't keep doing continuous overlapped I/O into the kernel 
and expect to get back an ERROR_IO_PENDING everytime without it ever 
blocking the applications call.  Something has to block or the kernel 
has to give back another error equivalent to EAGAIN of POSIX.  As I 
can't see any EAGAIN equivalent I presume it must block where the data 
rate of the writer is faster than the reader end of the pipe.

This is not true non-blocking IO as I see it.  So there is actually no 
non-blocking API unless you use PIPE_NOWAIT, for which there is a big 
fat warning not to use.  Nature did not intend PIPE_NOWAIT to exist.


As cgf writes:
| The idea of using threads for pipe writing has been bounced around for
| a long time.  It doesn't solve the select problem if there are
| multiple processes writing to one pipe.  That is not a completely
| unusual event, unfortunately.

I dont see the problem here, each writing process will have its own 
worker thread taking the block.

But to pickup with the point here.

The problem is between the select/poll/read/write event notification 
system within the same application.  We need to ensure when we signal 
writability on a pipe via the select/poll event mechanism that some work 
appears to be getting done at the next write() call.  Maybe we can 
return 0 ?  So at least we didn't block, the application has to already 
deal with partial writes when in O_NONBLOCK anyway.  In a real POSIX 
system it would never return 0 and always at least PIPE_BUF, but this 
may still be less than the 64Kb chunk the application was trying do in 
the first place.

When in blocking mode we can return EINTR (a ficticious signal 
notifcation) but then we run into problem where the application has 
blocked signals, but what about signals outside the scope of POSIX, like 
Linux RT signals.  What I'm saying buy this there maybe some signals 
that can not be blocked anyway so EINTR may still be valid.  But there 
is probably lots of application code which does not expect EINTR when it 
has already blocked all the signals it can think of.


Ah ha!  Eureka moment....

What about if all pipe write operations used overlapping I/O and was 
FIFO serialized within cygwin.  I believe the WriteFileEx() can return 
TRUE when the I/O went through first time and ERROR_IO_PENDING when its 
going to signal completion later.  This sticks with the always make a 
private copy of the POSIX application's data buffer in plan A, so thats 
a double buffered throughput loss for every write.  Ah well.

If we get TRUE back there is no problem, business as usual next time, if 
we get FALSE back with ERROR_IO_PENDING we consider that I/O to be an 
outstanding write on a pipe and we revoke the writability status in select.

We then call WaitForSingleObject() for the I/O completion (or we have a 
completion function do that work), when we get I/O completion we allow 
the next I/O from the FIFO through the gate.  If there was no more I/O 
in the FIFO we set write_ready=true and wakeup select's.

This model does not rely on over-writing to find the call that would 
block to be able to revoke writability.  It just uses the IO completion 
mechanism of overlapping IO which is how nature intended.

If throughput becomes a problem it maybe possible to apply heuristics 
with a guesstimate of the amount of OVERLAPPING IO the kernel can buffer 
before blocking.  Then instead of only one I/O per fd per process we 
could account for the amount of outstanding bytes and revoke writability 
based on that threshold figure.  This way multiple overlapped IOs can be 
outstanding in the kernel before throttle it with select.  But for now I 
just want get back to a working app.

If the POSIX pipe is in blocking mode we _deliberatly_ make it block 
until it gets completion signalled.  If its non-blocking mode and we 
have already revoked writability we return EAGAIN.


Thanks for your replies.

I have started to write WIN32 application code to help me completely 
understand the various windows IO models and NamedPiped implementation 
in detail.  So there can be some solid ground for me to tweak the 
proposal based on the rules in play with the NT kernel.


Darryl

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/