Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com To: cygwin AT cygwin DOT com From: Eric Blake Subject: Re: tee piping to head gives error message Date: Tue, 8 Feb 2005 21:39:28 +0000 (UTC) Lines: 144 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: usenet AT sea DOT gmane DOT org X-Gmane-NNTP-Posting-Host: main.gmane.org User-Agent: Loom/3.14 (http://gmane.org/) X-Loom-IP: 128.170.36.44 (Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)) X-Gmane-MailScanner: Found to be clean X-Gmane-MailScanner: Found to be clean X-MailScanner-From: goc-cygwin AT m DOT gmane DOT org X-MailScanner-To: cygwin AT cygwin DOT com X-IsSubscribed: yes Buchbinder, Barry (NIH/NIAID niaid.nih.gov> writes: > > > > Given that the purpose of head is to print the first few lines of a > > file, it kind of makes sense to me that it would close the file after > > it's read them rather than keeping the input file open and manually > > reading-and-discarding the entire rest of it for no good reason. > > Agreed. > > > So I reckon this is as-expected and by-design behaviour. > > I might put it as "as-designed" rather than "by-design". And for me, it > certainly was unexpected. tee and head are both part of coreutils. I would > expect that all coreutils would behave the same for head closing the pipe, > but they don't. And I would also expect that all utilities in a package > that includes a utility that breaks pipes as a normal course of its > operation would be silent when the two utilities are used together. I would > expect that tee pipes to head more often than something nasty happens and a > pipe just breaks. Coreutils is following POSIX, and the behavior of pipes is by design within POSIX. POSIX requires that a failed write into a pipe raises SIGPIPE, and that the default action on SIGPIPE is to terminate the process. Note that termination bypasses exit handlers registered with atexit(). POSIX also allows an application to ignore SIGPIPE; at which point it will detect failures in writing to a broken pipe but can continue in normal operation. Furthermore, all of the coreutils are designed to check, using atexit() handlers, for any failed write to stdout. Normally, tee and most other coreutils do nothing special with SIGPIPE, which means they only ignore SIGPIPE if their parent process was ignoring it. So my guess is that somewhere in your shell you are setting up your environment to ignore SIGPIPE, so that applications spawned by your shell see write failures to broken pipes rather than the default of early termination. Study this example, in bash, for more insight into child behavior when SIGPIPE is ignored or not: $ trap - PIPE # restore default handling to SIGPIPE $ yes | tee /dev/null | head > /dev/null $ echo ${PIPESTATUS[*]} 141 141 0 # yes and tee had SIGPIPE, head was successful $ seq 10000 | tee foo | head > /dev/null $ echo ${PIPESTATUS[*]} 141 141 0 # yes and tee had SIGPIPE, head was successful $ wc foo 2474 2475 11264 foo # foo did not get the complete output of seq $ trap '' PIPE # now ignore SIGPIPE $ seq 1000 | tee /dev/null | head > /dev/null $ echo ${PIPESTATUS[*]} 0 0 0 # all 3 programs were successful $ seq 10000 | tee foo | head > /dev/null tee: standard output: Broken pipe tee: write error $ echo ${PIPESTATUS[*]} 0 1 0 # seq and head were successful, tee noticed the broken pipe $ wc foo 10000 10000 48894 foo # foo got the complete output of seq $ yes | tee /dev/null | head > /dev/null tee: standard output: Broken pipe # At this point, yes and tee are in an infinite loop, hit ctrl-c $ echo ${PIPESTATUS[*]} 130 130 0 # yes and tee had SIGINT from ctrl-c, head was successful $ yes | tee -i /dev/null | head > /dev/null tee: standard output: Broken pipe # Again, an infloop, hit ctrl-c $ echo ${PIPESTATUS[*]} 130 1 0 # yes had SIGINT, tee just regular failure from broken pipe > > This seems like something the coreutils maintainer might want to address > with the upstream maintainers, or to patch himself. (I won't complain if he > doesn't patch it. Nope - as the cygwin coreutils maintainer, I won't patch coreutils, because the problem of an error message from writing to a broken pipe is not unique to cygwin (I ran the same tests on coreutils 5.3.0 on Solaris and saw similar behavior). However, note that tee currently has the POSIX-mandated -i option to ignore SIGINT, where in prior versions of coreutils it was treating -i as ignoring all signals; the change in 5.3.0 for tee to terminate on SIGPIPE was intentional, added around April 2004 (see /usr/share/doc/coreutils-5.3.0/NEWS, or http://lists.gnu.org/archive/html/bug-coreutils/2004-04/msg00126.html). You may have success if you propose upstream on the coreutils mailing list the addition of a new option to ignore SIGPIPE to allow the restoration of prior behavior while still complying with POSIX. You may also want to ask for an interpretation from the POSIX folks as to whether write errors to stdout must force tee to fail, or if the current wording that tee return 0 only if "The standard input was successfully copied to all output files" allows success even if writes to stdout failed, basing your argument on the fact that stdout is not one of the output files on the command line. http://www.opengroup.org/onlinepubs/009695399/utilities/tee.html If, as Dave Korn's followup pointed out, cygwin is hanging on some instances of pipe handling and process termination interaction, then that is a cygwin and not a coreutils bug, and I wouldn't know what to do to try to patch that. > Taking on coreutils was quite a commitment -- well > deserving of the two gold stars -- and I know that fixing this may be a low > priority.) Unfortunately, though PTC, I'm not capable of providing a patch. > In any case, tee seems to save its input as desired, so while the error > message is annoying and misleading, I suppose that one can live with it. You can make the error messages about the broken pipe consistently go away, but only by risking early termination of tee. Or, continue to ignore SIGPIPE and redirect tee's stderr to /dev/null; then tee will always run to completion, but you will miss any other error messages from tee. $ cat foo | tee -i bar 2> /dev/null | head > > It's just that tee notices when a write to stdout fails, whereas > > most applications are more loosely coded and don't check. Actually, as explained above, all of the coreutils that write to stdout check if those writes failed, provided they weren't terminated by a signal. That way, even something like `ls --help' will fail if stdout is redirected to a read-only file. > > > >> But the number of lines/bytes at which the error disappears > >> does not seem to be constant. > > > > Umm, no, .... it's equal to the number of lines in the source file. > > No. It should be equal to the numbers of lines in the source file but is > not. The error message went away around 126 or 130 lines, while the source > file had 556. > > (I would speculate that the disappearance of the error messages when enough > lines are provided might have something to do with buffering, but I'm not a > programmer and speculation by we mere "users" is sometimes discouraged. As > for why it is not consistent ...) Bingo - there is buffering going on. Note that POSIX disallows line buffering in tee, but does allow character buffering - the tee implementation reads a block at a time (probably 1024 or 2048 characters) before writing. Likewise, head reads a block before parsing it into lines, as that is faster than reading a line at a time. So, if head reads the entire block that tee wrote, even if only the first part of the block needed to be printed, then tee never sees a write failure. -- Eric Blake -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/