From: Michiel de Bondt <michielb AT sci DOT kun DOT nl>
Newsgroups: comp.os.msdos.djgpp
Subject: Re: how to use inline push and pop
Date: Mon, 14 May 2001 15:47:42 +0200
Organization: University of Nijmegen
Lines: 89
Message-ID: <3AFFE1FD.16E0E1E1@sci.kun.nl>
References: <Pine DOT SUN DOT 3 DOT 91 DOT 1010510171303 DOT 7067H-100000 AT is> <3AFBF8AB DOT C42331EB AT sci DOT kun DOT nl> <3277-Fri11May2001195316+0300-eliz AT is DOT elta DOT co DOT il>
NNTP-Posting-Host: fanth.sci.kun.nl
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Trace: wnnews.sci.kun.nl 989848062 12378 131.174.132.54 (14 May 2001 13:47:42 GMT)
X-Complaints-To: usenet AT sci DOT kun DOT nl
NNTP-Posting-Date: Mon, 14 May 2001 13:47:42 +0000 (UTC)
X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.7 sun4u)
X-Accept-Language: en
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

Eli Zaretskii wrote:

> > From: Michiel de Bondt <michielb AT sci DOT kun DOT nl>
> > Newsgroups: comp.os.msdos.djgpp
> > Date: Fri, 11 May 2001 16:35:23 +0200
> > >
> > > There's a popular belief that recursive code is terribly slow, but
> > > experience shows that this is mostly a myth.  Recursive code _might_ be
> > > slow, but in many cases it isn't.  Because recursive code is usually
> > > smaller, it fits better into the CPU caches.  It is also simpler, so you
> > > have less probability for bugs, and it lends itself better to compiler
> > > optimizations.
> > >
> >
> > I have once seen the opposite: faster recursive code..
>
> Yes, that's what I was saying as well.
>
> > What do you mean with profile? Fine-tuning the code within the
> > language itself?
>
> No, I mean use the profiler.  Compile and link the program with the
> "-pg" compiler switch, then run it, and when it exits, run gprof, the
> profiler which is part of the Binutils distribution.  It will show you
> where does your program spends most of its time.  If that place is not
> in the code you are trying to inline, you are wasting your time.

That is a good idea. I only have an Intel table with #clock cycles for
808x, 286, 386, 486. I use the 486 as a reference, but that might be
not such a good idea. I can hardly imagine that the code of the compiler
is faster: not in the first place since it uses more instructions, but mainly
because the stack is used for local variables. I reduced the number of
local variables on the stack to zero, but there is one small local array
that could be saved on the stack. Further, I made a computed goto variable
global, since the 486 table says that a near goto of a reg is as fast as of a
mem. But the compiler uses a reg for it, the base pointer. I do not understand
why the base-pointer is still computed in my routine that does not use
the stack, but of course, computing the base pointer does not take time.
In another routine, it is not computed, probably since the stack is not used
even when optimization is not used. But I think that most of the time is used
by the memory. Does profile give memory times as well? Probably, memory
loads should be postponed as long as possible. Or should they be distributed?

>
> > I discovered that the base pointer can be used as well, with
> > -fno-frame-pointer.  This makes an extra register available and my
> > code can be speeded up in another way.
>
> Yes, this is another optimization switch that you should try.
>
> > I started using many intel inline asms when I discovered that my C
> > instructions were not translated to the one-liners I had in
> > mind. The code gcc generates looks terrible.
> > See e.g. the following examples:
> >
> > C-code:
> > T.Byte += dd[2]
> > (union {long Long; unsigned char Byte; } T;)
> >
> > gcc-output:
> > movb %cl, %al
> > addb _dd+2, %al
> > movb %al,  %cl
> >
> > one-liner:
> > addb _dd+2, %cl
>
> You did compile this with optimizations, yes?  And you do have the
> latest GCC version, right?  Also, did you use the -march=pentium
> option?
>

With -O9, with is equal to -O3. I downloaded djgpp this year (February, I
guess).
I did not use the -march=pentium option, but I will try.

>
> You also should look at the time it takes to perform these
> instructions.  Sometimes, the code looks to be of poor quality, but it
> actually runs faster.

Is it wise to cast a long to a pointer? I wonder, since there might be
computers with
64 bit pointers. But you did not say anything about that. I will make a union
of a long and
a pointer, for 64 bit computers.

Best regards, Michiel