Mail Archives: cygwin/2006/08/02/13:49:31
Lev Bishop wrote:
> On 8/1/06, Darryl Miles wrote:
>> I am still interested in tackling the whole situation but I do need to
>> be furnished with a testcase to work with. I believe the original
>> comeback by the group of users running "unison" should have insisted a
>> testcase was produced by them to demonstrate the new breakage.
>
> As I recall, the "group of users running unison" was the exact same
> group as the group who developed the currently-commented-out code in
> select.cc, so there wasn't any particular need for them to provide
> themselves a test case....
>
> I'm sure it's all explained in the mailing list archives. Basically,
> the NtQueryInformationFile() gives back the amount of non-paged pool
> used by the pipe, which is only the same thing as the amount of data
> available to read in the case that there are no outstanding read()s on
> the pipe. Otherwise, the commented-out code can cause a write()r to
> deadlock any time the process at the other end of the pipe issues a
> read() for more than a pipe buffer's worth of data. This is much worse
> than the current situation, where a non-blocking write can
> occasionally block, which in turn may cause (serious) performance
> issues but rarely a total deadlock. (After all, cygwin is not an rtos
> and there is allowed to have arbitrary delays at any point in the
> code, without violating the posix semantics, so long as eventually the
> write() *eventually* returns.)
Okay you seem to have some understanding as to how and why it failed for
the "unison" group of users. Do you think the commented out code is
fixable in any so that all cases work correctly ?
The problem at the moment is that Corinna would like someone to explain
how the NtQueryInformationFile() approach is broken (and me for that
matter).
I find it difficult to understand that a Query function has a side
effect of causing other IO work to become deadlocked. So maybe for the
uninitiated I'd like to hear a clear simple description of events that
would occur from someone who understands it.
Maybe the deadlock you are reffering to a problem where the
NtQueryInformationFile() fails to see data which is actually in the pipe
so the deadlock comes from select() never returning correct events when
it should. i.e. the exact opposite of the current problem of it always
returning writability even when it shouldn't.
If we can all get to that level on understanding you, Corinna and I then
maybe we can all take a look at my propose approach to the problem. By
converting all writes (blocking and non-blocking alike) on pipes into
overlapping IO requests and double buffering the written data. Any
blocking sementics we need are created in CYGWIN code by putting the
thread to sleep. This also means we should be able to wake up correctly
for signals too.
Kernel buffer resource limits are imposed by a simple outstanding byte
counter, so we start returning EAGAIN when we have more than 'ulimit -p'
order of writes outstanding.
Checking the writability of a given FD then is a simply case of
revalidating if the outstanding byte counter has dropped below the
lowater buffering mark and also providing a wakeup to select() in every
case that it does.
Again thank you for your response the main problem on the issue is that
no many people know much about the history and technical reasons.
Darryl
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/
- Raw text -