From: Michiel de Bondt Newsgroups: comp.os.msdos.djgpp Subject: Re: how to use inline push and pop Date: Mon, 14 May 2001 15:47:42 +0200 Organization: University of Nijmegen Lines: 89 Message-ID: <3AFFE1FD.16E0E1E1@sci.kun.nl> References: <3AFBF8AB DOT C42331EB AT sci DOT kun DOT nl> <3277-Fri11May2001195316+0300-eliz AT is DOT elta DOT co DOT il> NNTP-Posting-Host: fanth.sci.kun.nl Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: wnnews.sci.kun.nl 989848062 12378 131.174.132.54 (14 May 2001 13:47:42 GMT) X-Complaints-To: usenet AT sci DOT kun DOT nl NNTP-Posting-Date: Mon, 14 May 2001 13:47:42 +0000 (UTC) X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.7 sun4u) X-Accept-Language: en To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Eli Zaretskii wrote: > > From: Michiel de Bondt > > Newsgroups: comp.os.msdos.djgpp > > Date: Fri, 11 May 2001 16:35:23 +0200 > > > > > > There's a popular belief that recursive code is terribly slow, but > > > experience shows that this is mostly a myth. Recursive code _might_ be > > > slow, but in many cases it isn't. Because recursive code is usually > > > smaller, it fits better into the CPU caches. It is also simpler, so you > > > have less probability for bugs, and it lends itself better to compiler > > > optimizations. > > > > > > > I have once seen the opposite: faster recursive code.. > > Yes, that's what I was saying as well. > > > What do you mean with profile? Fine-tuning the code within the > > language itself? > > No, I mean use the profiler. Compile and link the program with the > "-pg" compiler switch, then run it, and when it exits, run gprof, the > profiler which is part of the Binutils distribution. It will show you > where does your program spends most of its time. If that place is not > in the code you are trying to inline, you are wasting your time. That is a good idea. I only have an Intel table with #clock cycles for 808x, 286, 386, 486. I use the 486 as a reference, but that might be not such a good idea. I can hardly imagine that the code of the compiler is faster: not in the first place since it uses more instructions, but mainly because the stack is used for local variables. I reduced the number of local variables on the stack to zero, but there is one small local array that could be saved on the stack. Further, I made a computed goto variable global, since the 486 table says that a near goto of a reg is as fast as of a mem. But the compiler uses a reg for it, the base pointer. I do not understand why the base-pointer is still computed in my routine that does not use the stack, but of course, computing the base pointer does not take time. In another routine, it is not computed, probably since the stack is not used even when optimization is not used. But I think that most of the time is used by the memory. Does profile give memory times as well? Probably, memory loads should be postponed as long as possible. Or should they be distributed? > > > I discovered that the base pointer can be used as well, with > > -fno-frame-pointer. This makes an extra register available and my > > code can be speeded up in another way. > > Yes, this is another optimization switch that you should try. > > > I started using many intel inline asms when I discovered that my C > > instructions were not translated to the one-liners I had in > > mind. The code gcc generates looks terrible. > > See e.g. the following examples: > > > > C-code: > > T.Byte += dd[2] > > (union {long Long; unsigned char Byte; } T;) > > > > gcc-output: > > movb %cl, %al > > addb _dd+2, %al > > movb %al, %cl > > > > one-liner: > > addb _dd+2, %cl > > You did compile this with optimizations, yes? And you do have the > latest GCC version, right? Also, did you use the -march=pentium > option? > With -O9, with is equal to -O3. I downloaded djgpp this year (February, I guess). I did not use the -march=pentium option, but I will try. > > You also should look at the time it takes to perform these > instructions. Sometimes, the code looks to be of poor quality, but it > actually runs faster. Is it wise to cast a long to a pointer? I wonder, since there might be computers with 64 bit pointers. But you did not say anything about that. I will make a union of a long and a pointer, for 64 bit computers. Best regards, Michiel