Message-Id: <200810241244.m9OCigtj011361@delorie.com>
To: cygwin AT cygwin DOT com
From: Herb Maeder <maeder-cygml AT maeder DOT org>
In-reply-to: "Manning, Sid" <sidneym@qualcomm.com> 's message of                  Mon, 20 Oct 2008 11:53:19 PDT.
Subject: Re: Compile time Local Cygwin vs. VMware session on same system
Date: Thu, 23 Oct 2008 13:43:30 -0700
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
Precedence: bulk
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com

On 20 Oct 2008 11:53:19 PDT, "Manning, Sid" wrote:
> Of course I needed an excuse to ask the question, surprised or curious
> either would have sufficed.  So the implementation of a GNU/Posix stack
> over windows is expensive and that is understandable (I suffer from cross
> platform headaches all the time).  I knew that cygwin was somewhat slower
> than native Linux but I never took the time to make the measurements and
> seeing the numbers gave me a hope that maybe I had a simple configuration
> problem.  If there was some magic bullet that could shave part of the
> expense from these types of operations I would gladly use it and that was
> why I posted my message.
>
> While much of my development is done on GNU/Linux many, if not most, of my
> users rely on Cygwin/Windows as their primary run-time environment.  Most
> don't recognize the performance penalty but it would have been great to
> swizzle the config make things X% faster.
>
> I appreciate everyone's insight and I will definitely checkout Mecklenburg's 
> make book to get hard stats on the differences.

Sorry for the delay in posting this, it took me a while to compute and
digest the numbers to support my response...

As others have noted, you are bumping up against some of the performance
limitations in cygwin.  But given that you are using make, there are a few
things that you might be able to try to improve the performance of your
build.

Listed in order of difficulty to implement (IMO):

  1) use dash instead of bash for your make SHELL 
        ~10% overall time reduction
        ~110ms savings per shell invoked by make
  2) use cygwin-1.7 
        ~20% overall time reduction
        ~170ms savings per shell invoked by make
  3) use parallel makes on multiple cpus/cores 
        ~40% time reduction for 2cpus
  4) minimize extraneous shell commands/recusive makes (use native make syntax)
        ~1-2 minutes for 'zero-work' case

Big caveat:  YMMV on how much time may be saved for your particular build
environment and build machine characterisctics.  These numbers were
computed from a fairly large, real life, software development environment
using a 2.66GHz Dual Core cpu.  So the numbers do have some basis.  At
least they show that these changes move the numbers in the right
direction.

Also note that whether items 1, 3 and 4 make sense may depend on how much
control you have (or want to have) over the makefiles in question.

See "Detailed Analysis" below for more information on how the numbers were
computed. 

Here is some more information on the changes suggested above:


1) use dash instead of bash for your make SHELL

Dash is POSIX compiliant version of /bin/sh, which can be used as a light
weight replacement for bash in many cases (it may not always be possible
if some bash specific syntax is needed).  It starts up quicker than bash,
so you will realize some time savings on every shell command issued by
make.

There is no cygwin package for dash, but it compiles readily under cygwin
and is available at:

       http://gondor.apana.org.au/~herbert/dash/

You should be able set the SHELL to 'dash' at the top of your makefile,
and where needed, replace with 'bash' on a per-target basis:

     SHELL := /bin/dash
     [  . . . ]
     target-foo: SHELL := /bin/bash
     target-foo:
              rule-for-target-foo-using-some-bash-specific-syntax

Using dash may also help with cygwin/linux compatibility issues if you
happen to run on a linux distro that uses dash (instead of bash) as
/bin/sh.


2) use cygwin-1.7

Several performance improvements have been made in the cygwin-1.7 release,
in particular regarding pipes and the forking mechanism.  I don't think it
will get you near to the linux performance, but it should decrease your
build times by some decent percentage.

Just comparing strace output on some very simple commands, I've seen
savings of 100ms using cygwin-1.7.

Although it's not officially released, I've found that it is functioning
well enough to run most build related tasks.

You can install it with:
       
       http://cygwin.com/setup-1.7.exe


3) use parallel makes on multiple cpus/cores

You can use the -j option of make to run multiple targets in parallel,
which should buy you some overall throughput on multi-cpu or multi-core
machines.

Theoretically you should be able to cut your build time in half with "make
-j2", but in practice this won't happen since some things cannot be
parallelized and you may have processes competing against each other for
disk IO.

Beware, I've seen spurious segmentation faults when using -j2 with the
stock cygwin version of make.  I was able to compile my own version with
the patches suggested at the bottom of this make bug report.  Once I did
this, the -j option seemed to work quite reliably:

      http://savannah.gnu.org/bugs/?14853


4) minimize extraneous shell commands/recusive makes (use native make syntax)

Depending on your situation, this may give you the biggest win of all.
Especially for the "no work" case (where make determines that nothing is
out of date).  For a well constructed build system with correct build
dependencies, only incremental makes should be needed.  So optimizing the
"zero work" case should buy reasonable time savings and allow the
developers to compile more often.

I don't have precise numbers on this, but I seem to remember a recursive
make taking on the order of 1-2 minutes.  And that numuber was reduced to
10-15 seconds after eliminating the recursive make and most other
extraneous shell commands.

One way to think about this approach is compute the overhead in running make
to do a build.  To do this, jam the hardwired list of commands needed for
a for a build into a shell script.  This is your baseline (theoretical
minimum amount of time to build).  The amount of additional time that it
takes to run make to do the same task is your "make overhead".  The goal of
this approach is to get the "make overhead" as close to zero as possible.

But unlike the other options, there is not a simple global fix.  You
basically have to rewrite your makefiles to avoid invoking shells wherever
possible.  This includes avoiding invoking make recursively, which has
quite a bit of overhead in a cygwin environment.

Here an article on how to create a build system without using make
recursively: 

     http://www.cmcrossroads.com/content/view/8133/268/

The idea is to load your entire project into a single invocation of make,
then do all the dependency checking at once, even though your source code
may be spread across many subdirectories.

The bummer of this approach is that it doesn't buy you a whole lot for
small projects.  And for large projects, it can get really difficult to
implement.  The problem is that you are stuck with the severely limited
programming capabilities that make provides.  If you try hard enough,
you'll find that most tasks are possible to do, but even simple tasks can
get convoluted and difficult to follow.  For example, a simple increment
of an integer is now as complex task (since make offers no native support
for something like this).  Plus there are pitfalls, restrictions, and
head-scratching errors every step of the way.

This is not a task for the faint of heart, but it can buy you some
significant build performance with make on cygwin system.  It doesn't hurt
on a linux system, but the gains are not nearly as large.

If you head down this approach, using the gmsl library might be useful.
It is a set of make functions that implement some commonly needed tasks
that make does not supply directly.  You can find it here:

     http://gmsl.sourceforge.net/

A derivative to this approach is to dump make and use a build tool that
has a real programming language (perl, python, ruby,...) embedded in it
(like makepp, scons, rake, rant,...).  They may be inherently slower than
make, but it may be a much easier system to maintain and still provide
some performance improvement over a recursive make.  But I haven't
traveled far enough down those paths yet to know for sure.


Detailed Analysis
-----------------
This anaylsis is for a relatively large software build environment using
gcc/g++/mono and several homespun tools.  The build system uses a
non-recursive make system that spans many directories and runs on both
linux and cygwin machines.  The system has already been optimized so that
the only shell commands that are used are in the target rules (no use of
$(shell)) with the exception of one 'cygpath' command.

The build fires off 2594 shells from make.  But note that each shell may
end up invoke several commands.  The breakdown of what is done looks
something like this:

   685 gcc/g++
    71 mcs (mono compiler)
   662 misc internal buildtools
   396 misc file manipulation (mv, rm, mkdir,...)
   780 echo (most of which should ulimately be replaced with calls to $(info))

Obviously, the majority of the time of a clean compile is spent running
the commands in the target rules.  But it is interesting to understand how
much overhead there is in running make.  To compute this difference, I
basically took the output of "make -n" to generate a hardwired list of all
the commands needed to run a full build.  I dumped those into a shell
script and ran it to compute the "raw compile time".  The assumption is
that this theoretically the fastest you can run the build.  Any additional
time required by a full make is "make overhead" needed to compute the
dependencies, determining what needs to be run, and firing off the build
commands.

This analysis was all done on an Intel Core 2 Duo 6700 (2.66 GHz) machine
running Vista. 

Here are the raw numbers for runnning a full (clean) build for cases 1, 2 
and 3 above.   The numbers are the 'real' time reported by /bin/time (aka 
wall clock time, or engineer's thumb twiddling time):


                           cygwin-1.5        cygwin-1.7
                           ----------        ----------
    raw compile time       38m50.333s        30m5.687s
    full make (bash, -j1)  46m14.146s        38m52.014s
    full make (dash, -j1)  41m29.490s        32m2.754s
    full make (bash, -j2)  26m51.059s        22m12.771s
    full make (dash, -j2)  23m55.709s        19m14.121s


And looking at the single cpu cases, we can compute the overhead to run
make (compared to just running all necessary commands directly).  These
numbers are the same as above, converted into seconds:

                            cygwin-1.5                 cygwin-1.7
                       --------+--------------    --------+--------------
                        time   | make overhead     time   | make overhead
                       --------+--------------    --------+--------------
    raw compile time    2330.3       --            1805.7      --
    full make (dash)    2489.5      159.2          1922.8     117.1
    full make (bash)    2774.1      443.8          2332.0     526.3


From these numbers we can make several observations.  Note once again,
that these numbers apply to only this specific build environment and build
machine.  So these numbers and these observations should not be
generalized.  But they do indicate that some of these changes can result
in significant performance increases.

  * cygwin-1.7 reduces the overall build time by 15-20%

      This comes comes from comparing the overall build time for equivalent 
      builds under cygwin-1.5 and cygwin-1.7.  For example, for the raw
      compile time: 1805.7/2330.3 = 77.5%

          raw:       22.5% reduction  (77.5% of cygwin-1.5 time)
          dash,-j1:  22.8% reduction  (77.2% of cygwin-1.5 time)
          bash,-j1:  15.9% reduction  (84.1% of cygwin-1.5 time)
          dash,-j2:  19.6% reduction  (80.4% of cygwin-1.5 time)
          bash,-j2:  17.3% reduction  (82.7% of cygwin-1.5 time)

  * dash reduces the overall build time by >10%

      This is computed by the ratio of the time of a full make using dash
      and a full make using bash.  For example, using cygwin-1.5,
      2489.5/2774.1 = 89.7%

          dash+1.5:  10.3% reduction (89.7% of bash+1.5)
          dash+1.7:  17.6% reduction (82.4% of bash+1.7)

  * cygwin-1.7 saves >170ms per make shell invocation

      This is computed by dividing the difference of equivalent build times
      under 1.5 and 1.7 by the number of shells invoked by make.  For 
      example, when using bash:  (2774.1-2332.0)/2594 = 0.170 s

          using dash:  1.7 is 218ms faster than 1.5 per make shell invocation
          using bash:  1.7 is 170ms faster than 1.5 per make shell invocation

  * dash saves 110ms per make shell invokation (over bash)

      This is computed by dividing the difference of equivalent build times
      using dash and bash by the number of shells invoked by make.  For 
      example, when using 1.5:  (2774.1-2489.5)/2594 = 0.110 s

          using 1.5:  dash is 110ms faster than bash per make shell invocation
          using 1.7:  dash is 158ms faster than bash per make shellinvocation

  * dash reduces "make overhead' by >64% (over bash)

       This is computed the ratio of the make overhead when using dash to that
       of using bash.  For example, under cygwin-1.5:  159.2/443.8 = 35.8%
      
          dash,1.5:  64.2% reduction (35.8% of bash/1.5 make overhead)
          dash,1.7:  77.8% reduction (22.2% of bash/1.7 make overhead)

  * "make -j2" reduces the overall build time by ~40%

      This comes from computing the ratio of  "make -j1" times and the
      equivalent "make -j2" times.  For example, comparing runs using
      1.5+bash:  (26*60+51.0)/(46*60+12.1) = 58.1%

          -j2,bash,1.5:  41.9% reduction  (58.1% of -j1 time)
          -j2,dash,1.5:  42.4% reduction  (57.6% of -j1 time)
          -j2,bash,1.7:  42.9% reduction  (57.1% of -j1 time)
          -j2,dash,1.7:  40.0% resuction  (60.0% of -j1 time)


Herb.

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/