Date: Wed, 24 Feb 1999 09:23:29 -0500 Message-Id: <199902241423.JAA29290@envy.delorie.com> X-Authentication-Warning: envy.delorie.com: dj set sender to dj AT envy DOT delorie DOT com using -f From: DJ Delorie To: pgcc AT delorie DOT com Subject: list info Reply-To: pgcc AT delorie DOT com Note that the anti-spam filter at delorie.com is very aggressive, so things like redirected mail may not go through. I've updated the filter to allow "resent-to" in addition to "to" for where it looks for my addresses, but in the future, if you don't see your mail go through, try sending directly to pgcc AT delorie DOT com. You may also need to rewrite certain spam-like phrases like "rem0ve from the 1ist" [sic]. There is also an "offensive word" filter currently enabled for pgcc (that's the default but I can disable it) so if your mail bounces because of an offensive word, please just remove it and resend. If all else fails, ask me. I keep copies of all rejected messages. Thanks, DJ Original message follows. > To: pgcc-list AT desk DOT nl > Resent-From: johnny AT entity DOT netcologne DOT de > Resent-Date: Wed, 24 Feb 1999 04:06:04 +0100 > Resent-To: pgcc AT delorie DOT com From: =?iso-8859-1?Q?Johnny_Teve=DFen?= To: pgcc-list AT desk DOT nl Subject: 19981109 scheduler Hello! First, I know I'm not using the latest pgcc/egcs, but you might want to have a look at this using your latest snapshots, too. It's about how the scheduler schedules unrolled loops of integer/fp commands. First the code: double foo (int i, double d) { int j; for (j =3D 20; j; --j) { i *=3D i; d *=3D d; } return d*(double)i; } Now compile this using -funroll-all-loops. It will result in a loop that runs twice and has 10 "imull" and 10 "fmul" instructions in it. What confused me was the way these got mixed. To make a long output short, I replaced every imull by '.' and every fmul by '*'. Compiled using gcc -fverbose-asm foo.c -S -o - -funroll-all-loops -O6, and one of the following: Option: Output: -march=3Di386: .*.*.*.*.*.*.*.*.*.* -march=3Di486: .*.*.*.*.*.*.*.*.*.* -march=3Di586: ....*.*.**.*.**.*.** -march=3Di686: ******.*...*...*...* -march=3Dk6 : *..*.*.*.*.*.*.*.*.* Especially the pentium (i586) ones look strange to me: At the beginning of the loop, the FPU is nearly totally left alone (well, I don't think the load-"d"-from-stack still occupies it here). And is the pentiumpro (i686) really capable of collecting 6 fp multiplications in its queue? Please don't be angry if I'm totally misunderstanding something, but some of the scheduler effects confused me quite a bit for the last days. Then, a little memory-juggling question: double bar (int i, double d) { return d * (double)i; } Compiled using -O6, on -march=3D{i386,i486,i686,k6} I get the (good) result: bar: fildl 4(%esp) fmull 8(%esp) ret But -march=3Dpentium (the default) gives this: bar: movl 4(%esp),%edx pushl %edx fildl (%esp) addl $4,%esp fmull 8(%esp) ret Using -O4, it's even worser for pentium, whereas, for example, "-O4 -march= =3Dk6" only produces the a-little-worser code that "-O6 -march=3Dpentium" outputs. Are the other chip specific optimizers better than the pentium's, or is this code really faster on pentium? This is gcc version pgcc-2.92.21 19981109 (gcc2 ss-980609 experimental). Please send me a Cc: of all possible replies, since I'm not on the list and very interested in them. ciao, johnny