delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2013/07/17/22:27:05

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:from:to:subject:date:message-id:references
:in-reply-to:content-type:content-transfer-encoding
:mime-version; q=dns; s=default; b=XFyXgpVrKHsWcRsbErgN9KtmFVpok
N0J5nz2v2ThoBg3bZQbSB/vRZqYkqtDQc6nuRO/YFcYIYfG7rjJzZN7CqSUMhs51
uEYuQhY7uY3Vm6KLY3x2XDRd2HUZH4FG3zbbE6+iiXnIK+LWLTbndr+/4CgQVVvd
sbaLfyRaXodxj0=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:from:to:subject:date:message-id:references
:in-reply-to:content-type:content-transfer-encoding
:mime-version; s=default; bh=6RD5f/PS/YoGMtPcQpNulWKtulo=; b=aQe
9+Z7AV4zCxhz6mFlN273S2jgCQ0mgX6gXYDkGyzieIOQsAsOXQ6OlhJvKgsnsT4L
1i/gbJAGYRTTRI+uNolvGXUPdiq40REOQ3LbnhrLhEiP1V3X8ySmuwdRIVizV0Z0
oPO0KMFEC+uWcR7F/cgI/TWq4Nzkx0m84bmwNAzU=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
X-Spam-SWARE-Status: No, score=0.9 required=5.0 tests=AWL,BAYES_50,KHOP_THREADED,RDNS_NONE autolearn=no version=3.3.1
From: Devin Nate <devin DOT nate AT cloudwerx DOT com>
To: "cygwin AT cygwin DOT com" <cygwin AT cygwin DOT com>
Subject: RE: ssh.exe on cygwin: Write error
Date: Thu, 18 Jul 2013 02:26:39 +0000
Message-ID: <19F61B611B92744EAB1F9B19D0A0E2B1812EAF2A@EXCHANGE1.QuadrantHR.com>
References: <19F61B611B92744EAB1F9B19D0A0E2B1812E5B7D AT EXCHANGE1 DOT QuadrantHR DOT com>
In-Reply-To: <19F61B611B92744EAB1F9B19D0A0E2B1812E5B7D@EXCHANGE1.QuadrantHR.com>
MIME-Version: 1.0
X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id r6I2R0rx003258

Final followup to close the loop on this.

Having debugged rsync, ssh, and finally Cygwin...the problem turned out to be a D-Link router doing (a bad job of) QoS processing.

Each of rsync, ssh, and Cygwin appear to have operated exactly correct, including pipe(), select(), stdin/stdout, and Windows socket handling.

Thanks,
Devin


-----Original Message-----
Sent: Tuesday, July 16, 2013 12:04 AM
Subject: RE: ssh.exe on cygwin: Write error

Dear Cygwin list;

So I've made some progress on the problem with ssh I started out trying to solve... unfortunately, it's got me in select.cc in Cygwin.

Basically, the ssh.exe program operates as this:

Ssh sets up a connection, and starts client_loop;

client_loop monitors (in the debugging case) a single channel. It checks to see if input is to be read (from stdin in this case), and checks if there's data to write from an output buffer and also if select() says the outbound connection is writable. In the case of debugging, the network connection from ssh.exe to the server is on fd 3.

If there's data to read, it reads it into a buffer.

If there's data to send in the output buffer AND select() says that fd 3 is writable, then it calls packet_write_poll, which then calls roaming_write, which does a write() on the fd.  If there's a failure to write(), then packet_write_poll sees what the error is. EAGAIN, EINTR, and EWOULDBLOCK (same as EAGAIN on Cygwin) are non-fatal. Any other error is fatal.


In debugging, what happens is that the client_loop is processing away just fine. As it happens, it's reading more data than writing on stdin. It is happily writing data on the outbound socket, using write() as called by roaming_write as called by packet_write_poll. At some point, something ?bad? occurs.

1. Select() says that the fd 3 (outbound connection) is writeable to the network.

2. Write() goes to write, but gets an error 11 (EAGAIN).

3. Many (probably 50-100) calls to select() say that the socket is not writeable, and a packet trace on the server side confirm that the flow of packets has completely stopped. I can see that peek_socket() in select.cc is returning 'peek_socket: read_ready: 0, write_ready: 0, except_ready: 0' in the strace.

4. After some time (30 seconds) select() on fd 3 returns both readable+writable. It tries to read from fd 3, but it gets an error 104 (ECONNRESET). It subsequently tries to write on the socket, and also gets an error 104 (ECONNRESET).

5. Since the write() failed, it returns that to roaming_write, which returns it to packet_write_poll. This prints the fatal error "Write failed: connection reset by peer".

6. Interestingly, the server side has not issued a tcp/ip rst. In fact, from the server perspective, it just looks like the tcp/ip connection stalled (happens right at the error 11). The server side isn't shut down till some time later.

7. Definitely, the connection does get 'backed up' so to speak - i.e. I'm pushing more data than the internet connection can handle without blocking to process data, and I would expect select() and/or write() to fail waiting for the network to clear some buffers. That said, it's almost like the socket die's or needs to reset or something after the error 11 (EAGAIN).

8. I don't see any signals or timeouts happening. Also, I've retested with Cygwin 1.7.21 with no additional success.


I'm going to keep looking, but any thoughts with the new information?

Thanks,
Devin




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019