Mail Archives: djgpp/1994/11/21/18:29:20
SHORT STORY:
How many reasons can you think of that will cause a CPU-intensive
program run twice as fast when compiled with Borland's C as compared
to the same code compiled with DJGPP 1.12maint2? I specifically ask
for *any* reason you can possibly think of, because I have such a
program and, after testing every cause I could think of, I'm out
of reasons.
LONG STORY:
I have a program which was originally written and compiled in Borland
C++ 3.1. Recently, I was asked by a friend who wrote it to compile it
under GCC (he wants more memory for a hash table his program uses). My
problem is that when compiled with Gnu C++, the program runs only about
half as fast as the BC++ version on the same machine. I've tried
several things in the hope I understand who is the culprit (see below),
but couldn't find anything worth mentioning. So I'm totally confused.
If anybody can suggest new ideas, I would be grateful.
This is a chess-playing program. From what I've seen, it is quite
CPU-bound; most of the time it just computes possible moves and checks
their scores. From time to time it writes short (~10 chars) messages
to the screen (with cprintf()), and once every move it writes a line to
a logfile. Other than this, it doesn't do anything I can think of which
would require a switch to real mode. Does anybody know reasons other
than file I/O which will cause a mode switch?
The measure of the program's performance is the number of moves it
considers per second; as I said, this is roughly half as large for
DJGPP-compiled program as for the BC++ one. Matches typically
take at least 5 minutes, so we are *not* talking about loosing several
seconds here and there.
I use the -O3 -funroll-loops optimization switches. The -O3 is because
the program defines several inline functions, and I understand only -O3
actually performs the inlining. I've run the profiler and found the
histogram to be fairly flat: the most expensive function takes about
15% of run time. None of the library functions appear in the profile
anywhere near the beginning, so the library is not the culprit. The
program is written in C++, but it doesn't use any classes but its own,
so the class library supplied with the compiler cannot be the reason
for this.
There is one thing I cannot accept as an assumption: that GCC can produce
code so much slower than BCC, for a program which mostly needs CPU. This
is based on some experience, not only on ideology. If anybody out there
knows about some circumstances where such a lossage is possible, let him
speak now.
I tried different combinations of other optimization-related switches, but
none produced any significant effect. I can't say I've tested all the
switches which might be relevant, so if you have a list of such switches
to test, go ahead and tell me, even if the list is long.
I also thought that the GCC-compiled program might be a little larger, so
it just happens to not fit in the CPU (L1) or secondary (L2) cache, so
I've compiled the GCC version with all int's #define'd to be shorts
(that's what BCC does)--nothing happened.
- Raw text -