Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Subject: RE: OT: possible project/research project
MIME-Version: 1.0
Date: Wed, 20 Mar 2002 20:33:21 +1100
Content-Type: text/plain;
	charset="us-ascii"
Message-ID: <FC169E059D1A0442A04C40F86D9BA76062E0@itdomain003.itdomain.net.au>
content-class: urn:content-classes:message
From: "Robert Collins" <robert DOT collins AT itdomain DOT com DOT au>
To: "Randall R Schulz" <rrschulz AT cris DOT com>, <cygwin AT cygwin DOT com>
Content-Transfer-Encoding: 8bit

Randall,
responses inline..

> -----Original Message-----
> From: Randall R Schulz [mailto:rrschulz AT cris DOT com] 
> Sent: Wednesday, March 20, 2002 7:34 PM

> >Well we still have that basic separate - bash's builtin's 
> for example. 
> >If
> >it's not builtin, it needs a sub process.
> 
> That's not quite right. Built-ins still need sub-processes if 
> they're going 
> to operate in a pipeline or are enclosed within parentheses.

Ok. So if it's not builtin, or it's a builtin that needs to be
pipelined/parentisised it requires a sub-process. That sounds like
something that a patch to the relevant shell might provide some easy
wins.
 
> >sub process's after all) -  but we have the source so....
> 
> How will your magical push_context protect from wild pointer 
> references, e.g.?

If that becomes a problem, I'd suggest that dll's get loaded on page
boundaries and we protect the non-permitted address space with
read-only, and install an exception handler that unprotects and restores
context. It may be that handling that is not worth the development time
- so reliability could be an issue.  
 
> >The fork()/exec() model bites. Sorry, but it does. fork() 
> based servers
> >for instance run into the galloping herd - and scale very 
> badly. The other 
> >use for fork -the fork/exec combination is better achieved 
> with spawn() 
> >which is designed to do just that one job well. It also 
> happens to work 
> >very well on cygwin, and I see no reason to change that. So 
> spawned apps 
> >will remain completely separated and independent.
> 
> Servers are not shells. Why should they fork at all? That's 
> what threads 
> are for. It's also why CGI (without something like mod_perl) 
> is not a good 
> thing and the Java server model has significant advantages.

Exactly... my point is that the fork/exec model has no innate use.
vfork/execve does - which is what spawn (look under posix_spawn() for
the offical spawn these days) accomplishes.
 
> Are you planning on incorporating your scheme into every 
> program that runs 
> sub-processes on a regular basis? How likely is it that what 
> works in one 
> shell will work in another or in a server?

No. I'm not trying to create a new operating environment, I'm trying to
address a common-case issue. If I can get certain configure scripts to
run in under 30 minutes on my machine here, I'd be very happy. As for
portability to different shells, or even to servers, I'd suggest that
keeping the API very simply and clean - much like the sub process model
is simple and clean would encourage such re-use.
 
> I don't know the details of spawn(). How does it accomplish 
> I/O redirection?

int posix_spawn(pid_t *restrict pid, const char *restrict path,
const posix_spawn_file_actions_t *file_actions,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict], char *const envp[restrict]);

Is the prototype. If file_actions is null, the the new process gets a
copy of the parents fd table. If it's not null, then it provides the fd
table for the new process.

> Obviously if you add something, the old stuff isn't 
> (necessarily) lost. I'm 
> just saying that the fork/exec process model is simple, 
> elegant, available, 
> universal and fully functional in all POSIX systems. Your 
> model is a horse 
> of another color and any given command that would avail itself of the 
> supposed benefits of your scheme must be recast into a library that 
> conforms to the requirements of your embedded task model.

Yes. Which is a significant impediment right from the word go. Which
should go some way to explaining my ambivalence on this idea. However
the building blocks to use this model are present and functional on all
POSIX systems, so there's no reason to assume we couldn't 'make it
work'.

> It doesn't prevent it, but to avail ones self of the putative 
> benefits of 
> your proposed scheme, a significantly different programming 
> model has to be 
> learned and used. All for what? A tiny incremental 
> improvement in program 
> start-up times on a single platform and one or two 
> pre-ordained shells?

Huh? That's an assumption. I'd hope I could achieve librarisation as
simply as casting main to lib_main, and providing link time replacements
for exit() and _exit() and fatal(). Then the real-binary doesn't use
those link time replacements.
 
> How much time do they save? That's for you to claim and 
> substantiate. I'm 
> not trying to justify or validate your project, I'm trying to 
> repudiate it.

I can tell. I'm not trying to defend it, as that assumes that it is
defendable. I'm discussing it in a neutral (ish) light, I hope. I am
trying to provide responses to the specific points you make as part of
that discussion.
 
> But consider this: By the time you complete this task, the 
> upward march of 
> system speeds (CPU and I/O) will probably have done more to improve 
> elapsed-time performance of command invocation than your 
> improvements are 
> going to achieve.

Straw poll, who here has and uses a machine more than 2 years old right
now? My hand goes up, as does my girlfriends, and my firewall. (My PC
happens to be a dual processor, but still). Also, consider that as
system speeds increase, so does the functionality. We may find MS
polling internet servers on process startup or something equally
ridiculous that drastically increase process startup speed. Certainly
system policies now play a part, as each process startup has to be
tested against an arbitrarily long list of rules. And don't talk about
virus scanners.
 
> And five staff-minutes per user per month? You think that's 
> significant? 
> What would you do with those five minutes spread throughout 
> the month? 
> That's right: Nothing, 'cause you'd get it in 
> fraction-of-a-second parcels.

Well that's an assumption. For me, I'd get it running configure scripts,
which is in far bigger chunks than fraction of a second. 
 
> Lastly, you'll have to have an ongoing effort to port changes 
> from the 
> stand-alone original versions of the commands to your 
> embedded counterparts.

No - sounds like you haven't been paying attention. In my very first
email I pointed out that this was not an acceptable approach, and that
committing changes upstream would be the only meaningful way of doing
this.
 
> >I'd guess at ash, as that's the smallest shell we have, but if it's 
> >easier
> >with bash, then I see no reason not to - as this would be a /bin/sh 
> >replacement - if the benefits were to be realised.
> 
> How many people use such a bare-bones shell? Unless you 
> modify them all, 
> there will be a sizeable user contingent that does not 
> benefit from your 
> efforts.

Nearly everyone here does - most scripts have #!/bin/sh in the header.

> I think you need a good technical justification for the effort you'll 
> expend relative to the benefits you're going to gain and the 
> detriments 
> you're going to incur.

Absolutely. The problem domain needs further refinement, a lit search is
needed, some rough test cases /mock upss to provide a rule-of-thumb idea
about the potential returns, cygwin needs serious profiling to
understand if my assumptions about performance are correct. Lotsa work
to do this right.
 
> As with all optimizations, you must measure the cost of the 
> current code 
> and that of replacement. In this case, you could possibly 
> mock up a test 
> jig that did DLL loading and compare that with the cost of 
> fork / exec. But 
> that would not include the unknown costs of your putative 
> push_context / 
> pop_context mechanism.

Absolutely. In fact 
"
Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet.
- M.A. Jackson

"More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity."
- W.A. Wulf

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil."
- Donald Knuth

"The best is the enemy of the good."
- Voltaire "
 
With assembly credit to
http://www-2.cs.cmu.edu/~jch/java/optimization.html

> "The proof of the pudding is in the eating." So until you've 
> done it, you 
> won't know for an empirical fact if it's a win and if so how 
> much of a win 
> it is.

Sure.

Rob

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/