Mail Archives: djgpp/1997/11/27/18:23:37
Eli Zaretskii wrote:
> There are several -falign-XXX switches. I suggest trying them.
OK, I am guessing you are talking about the -malign-loops=NUM,
etc., switches? (I couldn't find any "-falign..." switches in INFO.)
According to the gcc info-docs, using -m386 or -m486 automatically
switches the default alignment for -malign-functions= and -malign-jump=
between 2 and 4 respectively. The only switch not affected is
-malign-loops= , so I tried as many combinations of -m386 or -m486
and -malign-loops= and -malign-double as I could think of.
With binu2.8.1, using -m386 instead of -m486 caused a very tiny
decrease in speed: less than a tenth of a second slower.
With binu2.7, using -m386 actually INCREASED the speed by about
a tenth of a second.
Changing -malign-loops= to 4 caused no change with either bin2.8.1
or 2.7.
Adding -malign-double DECREASED the speed by nearly 1.5 seconds with
BOTH versions of binutils.
To allow anybody to duplicate some of my tests, I am posting a small
program that simulates the Mandelbrot math from my large program.
This code just follows the floating point loops and tests without
actually plotting any pixels. This is not optimized in any way. My
actual code uses a circle avoidance and guessing algorithm.
After several seconds of computing, it will print the time elapsed in
seconds and the number of clock ticks, all based on the very
low-resolution "clock()" function.
----------start sample program---------------------
#include <stdio.h>
#include <time.h>
typedef struct ComplexNum
{
double real; /* the complex number struct */
double imag;
} ComplexNum;
/* The mandlebrot math; tests z = (z^2) + c maxiter times and returns
color */
inline int test_point( ComplexNum c, int maxiter )
{
int i, color = 0;
/*double ycenter, ylcenter, xbcenter, xscenter;*/
double length;
ComplexNum z, x;
z.real = 0.0;
z.imag = 0.0;
/* uncomment to avoid testing in certain large circles:*/
/* ylcenter = c.imag;
ycenter = ylcenter * ylcenter;
xbcenter = c.real + 0.25;
xscenter = c.real + 1.0;
if ((xbcenter * xbcenter + ycenter) <= 0.2304 ||
(xscenter * xscenter + ycenter) <= 0.0529)
return (0);
*/
for (i = 1; i <= maxiter; ++i)
{
x.real = (z.real * z.real) - (z.imag * z.imag);
x.imag = (z.real * z.imag) + (z.imag * z.real);
z.real = x.real + c.real;
z.imag = x.imag + c.imag;
length = z.real * z.real + z.imag * z.imag;
if (length > 4.0)
{
if (i > 254) color = (i % 253) + 1;
else color = i;
break;
}
}
return (color);
}
int main(void)
{
int x, y, color;
ComplexNum c;
double start_y = 1.3, start_x = -2.3;
double scale = 0.0053125;
double t0, t1;
t0 = (float)clock() / CLOCKS_PER_SEC;
for (y = 10; y < 480; y++)
{
c.imag = start_y - (scale * y);
for (x = 0; x < 640; x++)
{
c.real = start_x + (scale * x);
color = test_point(c, 150);
}
}
t1 = ((float)clock() / CLOCKS_PER_SEC) - t0;
printf("%10.5lf seconds\n", t1);
printf("%f clocks\n\n", t1 * CLOCKS_PER_SEC);
return (0);
}
----------------------end----------------------------
Compiling with -m486 -ffast-math, I get:
binutils time clocks
2.8.1 5.93407 540
2.7 5.60440 510
Compiling with -m386 -ffast-math, I get:
binutils time clocks
2.8.1 5.98901 545
2.7 5.4945 505
I also tested this in plain DOS (instead of Win95), and the
numbers all showed similar ratios but about a tenth of a second
faster. All of my test were done on a Pentium 100Mhz with 48 MB RAM
and an Acer BIOS.
A couple of thoughts:
My timer is not very good; to really test this, we should use
the Pentium's built-in instruction counter, but I need to learn
how first.
This is based on floating point operations tested on a Pentium.
Who knows what would happen on a 386 or 486?
I hope I haven't discouraged people from using binutils 2.8.1;
as you can see, these are miniscule differences in speed I am
nitpicking. Nobody should really notice a second or less speed
difference. However, I have been trying to optimize my fractal
program for some time, and I was alarmed when all my work seemed
to disappear.
Finally, someone asked if perhaps I had installed a new version
of gcc without realizing it. Answer: Nope, it's definitely the
same one. I only downloaded two files: djdev201.zip, and bnu281.zip.
The rest I keep on floppies, including gcc.
Happy Thanksgiving,
Alan Doerhoefer
aland AT seanet DOT com
- Raw text -