Mail Archives: djgpp/1996/11/27/14:17:19
Errors-To: postmaster AT ns1
From: gminer AT Newbridge DOT COM (Glen Miner)
Newsgroups: comp.os.msdos.djgpp
Date: 27 Nov 1996 13:38:51 GMT
Organization: Newbridge Networks Corporation
Lines: 30
Nntp-Posting-Host: 138.120.136.238
Dj-Gateway: from newsgroup comp.os.msdos.djgpp
Content-Type: text
Content-Length: 1489
I am currently working on a program that grinds away in a recursive
algorithm tens of thousands of times over a small group of local data. It
doesn't do _any_ io, and only calls a select few functions. Due to the
immense number of calculations required, this is painfully slow. I've
optimized the algorithm a fair amount, and plan to improve it more, but
that is only one part of the optimization process.
I am currently using the -O3 compile option; is there any other switches I
should be enabling or disabling? I never need to debug the code, so symbol
information is not an issue.
Can anyone suggest any hard and fast rules for c code optimization under
djgpp? I mean, right now I have a 9x9 array of bytes, and try to index all
of the data in it with pointers. Is this necessary, or is x=MyArray[4][6]
just as good?
Since this code _is_ a port, I have noticed some data conversion too. An
int is no longer 16 bits long. Since I don't really need a 32 bit int in
the core code would it help at all for me to change things to "short
unsigned int"s?
I may end up doing the heavily used functions in assembler, but I'd like
to make sure that the c code is as optimal as it can be first. Any
comments, suggestions or pointers are welcomed.
The following compiles and runs were performed with GCC on DG Aviion. I have
found similar results in DJGPP on similar test in the past though I am not at
home now to try it. Running the following test program compiled various ways
gives results that verify that the GCC optimizer does a better job than you
can:
These compiles leave register allocation to the compiler:
--------------------------------------------------------
gcc -o tst.D tst.c
gcc -o tst.O -O tst.c
gcc -o tst.O2 -O2 tst.c
gcc -o tst.O3 -O3 tst.c
These compiles declare register variables explicitly:
----------------------------------------------------
gcc -o tst.DR tst.c -Dusereg
gcc -o tst.OR -O tst.c -Dusereg
gcc -o tst.O2R -O2 tst.c -Dusereg
gcc -o tst.O3R -O3 tst.c -Dusereg
Versions ranked by Array Indexing results worst to best:
Opt. Manual
Executable Level Regist. Array Indexing Pointer Deref.
------------- ----- ------ ------------------- ------------------
tst.D None No 5.752117 4.543308
tst.DR None Yes 1.621305 1.192638
tst.O -O No 0.801081 0.844428
tst.OR -O Yes 0.658607 0.869502
tst.O2R -O2 Yes 0.523563 0.642418
tst.O3R -O3 Yes 0.549404 0.718574
tst.O2 -O2 No 0.499495 0.676681
tst.O3 -O3 No 0.484081 0.709432
Clearly -O3 optimizations using array indexing and optimizer determined
register allocation is the fastest scheme. This shows several points:
1) Using array indexing is superior to using pointers when any level of
optimization is used.
2) Manual register allocation is only helpful if no optimization or -O
optimization is used!
3) Manual register allocation is most helpful with variables used in the
array indexing than pointers.
############################# tst.c ##################################
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <limits.h>
#include <sys/time.h>
#if defined(usereg)
#define REG register
#else
#define REG
#endif
#if !defined(ITERS)
#define ITERS 100000
#endif
void print_diff( char *str, struct timeval st, struct timeval en );
main( int argc, char **argv)
{
char arry[9][9];
REG long i, j, k;
REG char *p;
struct timeval start, end;
gettimeofday( &start, NULL );
for( i=0; i<ITERS; i++) {
for (j=0;j<9;j++)
for (k=0;k<9;k++) {
arry[j][k] = j+k;
}
}
gettimeofday( &end, NULL );
print_diff( "ARRAY", start, end );
gettimeofday( &start, NULL );
for( i=0; i<ITERS; i++) {
p = &arry[0][0];
for (j=0;j<9;j++)
for (k=0;k<9;k++) {
*p++ = j+k;
}
}
gettimeofday( &end, NULL );
print_diff( "POINTER", start, end );
}
void print_diff( char *str, struct timeval st, struct timeval en )
{
struct timeval df;
df.tv_sec = en.tv_sec - st.tv_sec;
df.tv_usec = en.tv_usec - st.tv_usec;
if (df.tv_usec < 0) {
df.tv_sec--;
df.tv_usec += 1000000;
}
printf( "%s: diff = %d.%06.6d.\n", str, df.tv_sec, df.tv_usec );
}
--
Art S. Kagel, kagel AT quasar DOT bloomberg DOT com
A proverb is no proverb to you 'till life has illustrated it. -- John Keats
- Raw text -