Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Date: Thu, 15 Sep 2005 10:45:10 -0400 (EDT)
From: Igor Pechtchanski <pechtcha AT cs DOT nyu DOT edu>
Reply-To: cygwin AT cygwin DOT com
To: Jan Schormann <jan DOT schormann AT brainlab DOT com>
cc: cygwin AT cygwin DOT com
Subject: RE: Cygwin build system SOOOO SLOOOWWWW ???
In-Reply-To: <CCAC2F20421E784A87FAFDB3E0EC5572F9FCEB@DEVXCH1.brainlab.net>
Message-ID: <Pine.GSO.4.63.0509151026330.12900@slinky.cs.nyu.edu>
References: <CCAC2F20421E784A87FAFDB3E0EC5572F9FCEB AT DEVXCH1 DOT brainlab DOT net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Thu, 15 Sep 2005, Jan Schormann wrote:

> Let's see ...
>
> > 1) How can I tell what Cygwin is doing?  Is there a tool that will
> > tell me what tool is actually running at any given time?  Is there
> > any way to tell what Cygwin is doing down in its guts?  Does anyone
> > have any other suggestions as to how I might get to the
> > bottom of this?
>
> Below, I'll tell about some suspicions I have about what cygwin might
> actually be doing. To your question, I can offer two Ideas:
>
> - "top" or any Windoze Process Explorer more sophisticated than
>   the task manager
> - "strace" - though I haven't ever used it, but from what I know
>   this will definitely give you an answer - maybe two much of it ;-)

You can give strace command-line options to show only the kinds of events
you want...  See the strace help (or
<http://cygwin.com/cygwin-ug-net/using-utils.html#strace>).

> > 2) Has anyone else experienced speed problems with Cygwin?  Has
> > anybody else felt that Cygwin has gotten slower over the last
> > year or so?  Are there any guidelines or "tricks" for getting
> > Cygwin to run faster?
>
> a) Forking is more expensive in Windoze.
> On Unix, especially in make environments, you'll often start new
> processes as you're going - and often you'll not even notice. Google
> for "bash tricks" on how to fork less often.

Forking is not as expensive in Windows as it is in Cygwin (especially if
you fork off a Windows process, since Cygwin creates a stub for that).
<http://cygwin.com/acronyms/#PTC>.

> Hint: Don't use "sed" in `backticks` just for simple string
> replacements.
> Much of this can be done in make or bash directly.
> Look at the changes you made - maybe you thought it's more elegant?

FWIW, much of this can be done directly in "make". :-)

> b) This is especially true for shells.
> I'm not really sure on when and where this hits, but under certain
> circumstances, bash needs to parse /etc/passwd when it starts. Do
> you create /etc/passwd from an LDAP directory using mkpasswd?

*bash* itself never parses /etc/passwd.  Cygwin does -- every Cygwin
process looks at /etc/passwd on startup.  The first Cygwin process
actually reads it, and the rest simply check whether it changed.

However, that's just a file stat -- it doesn't actually query the domain
or LDAP directory (at least after the first invocation -- it does query
the current user then, but I don't think it does that for all users).

> Maybe you have hired some more people last year and it got longer?
> Hint: Try whether it makes a difference if you replace /etc/passwd
> with one that contains only the local users (look at the options for
> mkpasswd).

This shouldn't make a difference for multiple forks.

> c) /bin/sh is now bash, which is now dynamically linked.
> Up until a few months ago, /bin/sh has been "ash", a smaller, but
> less powerfull shell. This has been replaced by bash, to reduce the
> traffic of repeated questions along the lines of "why does my shell
> act different than on linux" (where /bin/sh is bash on most
> distributions).
> If I understood the traffic on this list correctly, bash is now
> dynamically linked, which might have an impact on starting it - I can't
> tell.

It shouldn't.  The DLLs are in memory, so any subsequent invocation of
bash will load the cached versions (Windows does that automatically).

> Hint: Don't start bash so often. Create fewer processes, but if you
> must, see if you gain by using ash explicitely instead of bash.
>
> To the gurus - is the following correct?
> `echo blub` starts one process, `echo blub | sed -e 's/b/x/g'`
> starts three: "echo", "sed", and "bash" to implement the pipe.

I'm far from a guru, but let me take a shot at answering this:

If you're talking about running those from bash, then "echo blub" doesn't
start *any* processes -- "echo" is a bash builtin.  If you're asking about
make, make will start /bin/sh to execute the "echo" command.

"echo blub | sed -e ..." will start 1 process from bash, and 2 from make
(the 1 extra process is "sed" -- no process is created for the pipe).

FYI, "BLAH=blub; echo ${BLAH//b/x}" will not spawn *any* processes when
run directly from bash.

> d) Beware of lazy evaluation.
> Look at this construct:
> CFLAGS=$(shell find . -type d -name include)
> Read "info make" on setting variables and find out about the
> difference between "=" and ":=". The above will run the find
> again for every single call to the compiler. Along with the
> issues about forking and reading directories and small files,
> this can make a difference of *ages*.
> Hint: See whether you can use less variables, use ":=" more often,
> etc. - and don't use "$(shell ...)" anyway, as stated in a).
> Rather, pre-compute makefiles with all the data hardcoded, using
> ":=".

That's sound advice.

> e) Reading lots of small files seems more expensive on Windoze.
> I don't know about your Makefiles, but traditionally, makefiles are
> spread across project directories (for build hierarchies), and
> makedepend creates even more of that. For one of our applications, I
> roughly calculated that make needs to open, read, and parse well over a
> thousand files (not counting the source or objects or any such thing,
> just the makefiles), just for telling you that all the targets are up to
> date.
> Hint: Phew ...
>
> You see, for our configurations, running make to tell me that *nothing*
> has changed could take up to half an hour. Therefore we introduced some
> magic using Python to generate and split up makefiles two years ago, and
> were down below five minutes again.

If you're using make recursively, google on the evils of recursive make.
If not, please disregard this.

> This is nothing compared to the link time of well over 15 minutes, so
> we started to convert to DLLs for development (released applications
> are still supposed to be linked statically, as they only run on
> dedicated machines). We're currently trying to replace the whole build
> chain by a single daemon written in a decent language - hoping (i) that
> we need only one process for the actual rule system etc., and will only
> start additional processes for the compiler and linker; and hoping (ii)
> that the actual rule set will be much easier to debug. (You know,
> developers come to me and say "but I've only touched this little cpp and
> now everything's getting compiled again and ..." - how do I know what
> really happened?)
>
> > Thanks in advance for any feedback that might help me speed up my
> > builds.
>
> Let's see whether my hints are any good, but you're welcome anyway :-)

HTH,
	Igor
-- 
				http://cs.nyu.edu/~pechtcha/
      |\      _,,,---,,_		pechtcha AT cs DOT nyu DOT edu
ZZZzz /,`.-'`'    -.  ;-;;,_		igor AT watson DOT ibm DOT com
     |,4-  ) )-,_. ,\ (  `'-'		Igor Pechtchanski, Ph.D.
    '---''(_/--'  `-'\_) fL	a.k.a JaguaR-R-R-r-r-r-.-.-.  Meow!

If there's any real truth it's that the entire multidimensional infinity
of the Universe is almost certainly being run by a bunch of maniacs. /DA

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/