X-Spam-Check-By: sourceware.org To: cygwin AT cygwin DOT com From: Eric Blake Subject: Re: 1.5.24: incorrect default behavior of dd in popen context on text-mounted filesystem Date: Wed, 25 Jul 2007 15:28:55 +0000 (UTC) Lines: 162 Message-ID: References: <200707241548 DOT l6OFmXAE008571 AT linode DOT hodain DOT net> <46A64F0F DOT 20706 AT byu DOT net> <200707251419 DOT l6PEJnYK011631 AT linode DOT hodain DOT net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit User-Agent: Loom/3.14 (http://gmane.org/) X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com hodain.net> writes: > > 1) In the Cygwin User's Guide, page 33: > > c. Pipes and non-file devices are opened in binary mode, except if > the CYGWIN environment variable contains nobinmode. > > Warning! > In b20.1 of 12/98, a file will be opened in binary mode if any > of the following conditions hold: This documentation is rather old, so it must be read with a grain of salt. > 1. binary mode is specified in the open call > 2. the filename is a MS-DOS filename > 3. the file resides on a binary mounted partition > 4. CYGWIN contains binmode In particular, CYGWIN defaults to binmode, but binmode/nobinmode only affects non-disk files (ie. pipes, special devices) - it has no bearing on disk files, since that is what mount is for. > 5. the file is not a disk file In other words, 4 and 5 should be merged into a single condition. > > d. When redirecting, the Cygwin shells uses rules (a-e) [sic]. For > these shells the relevant value of CYGWIN is that at the time the > shell was launched and not that at the time the program is > executed. > > My reading of this says that I should expect dd to use binary mode on > its input and output files. And I should expect that stdin and stdout > from shell-launched programs will be in binmode, so that > popen("gzip|dd>file", "w") will use binmode. Please explain if my > interpretation is incorrect. popen invokes the shell, with the shell's stdout inherited from your current process's stdout, and with the shell's stdin being set to the other end of your pipe. The "w" in the popen implies that CYGWIN=binmode is consulted for how the pipe behaves, and unless you changed the default CYGWIN settings (which I doubt), that means the shell's stdin is binary (whereas using "wt" forces text mode, although that is unusual on pipes, and using "wb" forces binary mode). As the shell command will not be writing to stdout, it is irrelevant whether the shell's inherited stdout was text or binary. The shell then spawns two processes. gzip's process is given a pipe as stdout, and based on the CYGWIN=binmode default, it is binary; gzip's stdin is inherited from the shell, which means it is still binary. (The alternative command, popen("gzip > file", "w"), was a case where gzip-1.3.12-1 used text mode stdout, but gzip-1.3.12-2 correctly uses binary mode; it differs from the case in question based on stdout being a file rather than a pipe). The other process is dd, where stdin is a pipe (again, the CYGWIN=binmode default means it is binary). But dd's process is given a redirection to 'file' as its stdout. And since 'file' is a disk file, mount point rules take effect. Therefore, dd defaults to opening 'file' in text mode, unless dd takes extra pains to force binary mode. Presently, dd from both coreutils 6.9-3 and 6.9-4 leaves the mode of stdout unchanged. It only worries about explicitly (re-)setting the mode if you specify if= or of=, or if you use iflag= or oflag=. And since you did neither, your example results in dd doing text-mode output. On the other hand, popen("gzip|dd of=file", "w") makes dd, not the shell, responsible for opening 'file'. In that case, dd in 6.9-3 uses textmode (due to a bug in my code that tried to default to binmode), and in 6.9-4 uses binmode (as I had always intended). > > 2) In http://cygwin.com/ml/cygwin/2007-07/msg00610.html Eric wrote: > > [io]f= unspecified - no change to existing mode of std{in,out} > > My understanding is that the "existing mode" of stdin/stdout will be > binary (given what the User's Guide says), so it appeared to me that > dd was actively changing stdout back to text.... No, dd did not actively change stdout to text. Rather, stdout was already text when dd started, and dd did nothing about it because you did not specify oflag=. As I said before, I am still debating whether, in coreutils 6.9-5 (or 6.10-1, if upstream releases soon enough), dd will actively force binary even when of= is not specified, when oflag= is not specified. It is doable (and is a one- line patch), but I have not convinced myself that the change is worth it yet. On the other hand, since you seem to be so confused about the current dd behavior, that is an argument in favor of making the change. > > 3) In http://cygwin.com/ml/cygwin/2007-07/msg00610.html Eric wrote: > > .... Look for coreutils 6.9-4 coming soon to a mirror near you, > with dd once again defaulting to full binary operation. > > which I took to be further confirmation of dd using binary mode unless > otherwise specified. I understand now that you meant this comment to > apply only to files that dd opens, but I took this as a more blanket > statement and further confirmation that dd does whatever in needs to > do to implement the User's Guide spec. OK, so I probably could have been more careful in the wording of my release announcement. > > So I'm trying to understand the state of things. AFAICT, the spec in > the User's Guide must not be being honored by popen(). Is that the > case? Otherwise, why would dd's stdout in popen("gzip|dd>file", "w") > suffer text-mode modification? See above. > And why could gzip be patched to fix > things? Any process can call setmode() to explicitly change the text/binary mode of an already open fd, it's just that most upstream packages don't do this, so the cygwin maintainers have to add it in as a cygwin-specific patch. Basically, that's what cgf did in between gzip 1.3.12-1 and 1.3.12-2. And that's what I do in dd, when oflag= but not of= is specified. > And why can Eric "be open" to the idea of changing dd's > behavior in this case? Because I maintain the cygwin port of coreutils, because I read this list, and because I use cygwin on a daily basis. In other words, given enough user feedback, or a personal usage scenario that I can't solve in any other way, I am very prone to patching coreutils to do the right thing, even if it means diverging even further from the official upstream package and putting more maintenance burden on myself. > > I don't know much about Cygwin internals. Is there a bug in popen()? No. (There used to be, prior to Aug 2006, but I fixed that in newlib). > Or, is it the case that each executable is responsible for ensuring > that it honors the shell redirecting specifications in the User's > Guide? In the case of fds inherited across exec (such as stdin, stdout, and stderr), each process defaults to whatever the parent process gave it, unless it takes pains to do things differently (such as calling setmode() explicitly). If I understand it correctly, linking in binmode.o affects all fds that the process later opens, but does not automatically re-orient inherited fds. > Or is my reading of the User's Guide incorrect? Entirely plausible, but more likely it is because that section of the User's Guide is outdated and needs some TLC to bring it up-to-date. > > Thanks again for your responsiveness. > > -Hugh > > -- Eric Blake -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/