Date: Wed, 24 Feb 1999 09:23:29 -0500
Message-Id: <199902241423.JAA29290@envy.delorie.com>
X-Authentication-Warning: envy.delorie.com: dj set sender to dj AT envy DOT delorie DOT com using -f
From: DJ Delorie <dj AT delorie DOT com>
To: pgcc AT delorie DOT com
Subject: list info
Reply-To: pgcc AT delorie DOT com


Note that the anti-spam filter at delorie.com is very aggressive, so
things like redirected mail may not go through.  I've updated the
filter to allow "resent-to" in addition to "to" for where it looks for
my addresses, but in the future, if you don't see your mail go
through, try sending directly to pgcc AT delorie DOT com.  You may also need
to rewrite certain spam-like phrases like "rem0ve from the 1ist" [sic].

There is also an "offensive word" filter currently enabled for pgcc
(that's the default but I can disable it) so if your mail bounces
because of an offensive word, please just remove it and resend.

If all else fails, ask me.  I keep copies of all rejected messages.

Thanks,
DJ

Original message follows.

> To: pgcc-list AT desk DOT nl
> Resent-From: johnny AT entity DOT netcologne DOT de
> Resent-Date: Wed, 24 Feb 1999 04:06:04 +0100
> Resent-To: pgcc AT delorie DOT com

From: =?iso-8859-1?Q?Johnny_Teve=DFen?= <j DOT tevessen AT gmx DOT de>
To: pgcc-list AT desk DOT nl
Subject: 19981109 scheduler

Hello!

First, I know I'm not using the latest pgcc/egcs, but you might want
to have a look at this using your latest snapshots, too. It's about
how the scheduler schedules unrolled loops of integer/fp commands.
First the code:

double foo (int i, double d) {
  int j;
  for (j =3D 20; j; --j) {
    i *=3D i;
    d *=3D d;
  }
  return d*(double)i;
}

Now compile this using -funroll-all-loops. It will result in a loop that
runs twice and has 10 "imull" and 10 "fmul" instructions in it. What
confused me was the way these got mixed. To make a long output short,
I replaced every imull by '.' and every fmul by '*'. Compiled using
gcc -fverbose-asm foo.c -S -o - -funroll-all-loops -O6, and one of
the following:

    Option:      Output:
    -march=3Di386: .*.*.*.*.*.*.*.*.*.*
    -march=3Di486: .*.*.*.*.*.*.*.*.*.*
    -march=3Di586: ....*.*.**.*.**.*.**
    -march=3Di686: ******.*...*...*...*
    -march=3Dk6  : *..*.*.*.*.*.*.*.*.*

Especially the pentium (i586) ones look strange to me: At the beginning
of the loop, the FPU is nearly totally left alone (well, I don't think
the load-"d"-from-stack still occupies it here). And is the pentiumpro
(i686) really capable of collecting 6 fp multiplications in its queue?

Please don't be angry if I'm totally misunderstanding something, but some
of the scheduler effects confused me quite a bit for the last days.


Then, a little memory-juggling question:

double bar (int i, double d) {
  return d * (double)i;
}

Compiled using -O6, on -march=3D{i386,i486,i686,k6} I get the (good) result:

bar:    fildl 4(%esp)
        fmull 8(%esp)
        ret

But -march=3Dpentium (the default) gives this:

bar:    movl 4(%esp),%edx
        pushl %edx
        fildl (%esp)
        addl $4,%esp
        fmull 8(%esp)
        ret

Using -O4, it's even worser for pentium, whereas, for example, "-O4 -march=
=3Dk6"
only produces the a-little-worser code that "-O6 -march=3Dpentium" outputs.

Are the other chip specific optimizers better than the pentium's, or is
this code really faster on pentium?

This is gcc version pgcc-2.92.21 19981109 (gcc2 ss-980609 experimental).
Please send me a Cc: of all possible replies, since I'm not on the list
and very interested in them.

ciao,
johnny