From: Michiel de Bondt Newsgroups: comp.os.msdos.djgpp Subject: Re: how to use inline push and pop Date: Thu, 17 May 2001 17:07:00 +0200 Organization: University of Nijmegen Lines: 60 Message-ID: <3B03E914.28DC92D@sci.kun.nl> References: NNTP-Posting-Host: fanth.sci.kun.nl Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Trace: wnnews.sci.kun.nl 990112020 464 131.174.132.54 (17 May 2001 15:07:00 GMT) X-Complaints-To: usenet AT sci DOT kun DOT nl NNTP-Posting-Date: Thu, 17 May 2001 15:07:00 +0000 (UTC) X-Mailer: Mozilla 4.75 [en] (X11; U; SunOS 5.7 sun4u) X-Accept-Language: en To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com Eli Zaretskii wrote: > On 10 May 2001, Hans-Bernhard Broeker wrote: > > > You really should rethink your strategy all the way since you started > > at square one. What you're doing here, actually, is to avoid the usual > > overhead involved in recursive function calling --- but you're doing > > it the wrong way. To avoid having more than absolutely necessary > > pushed to the stack in recursive calls, use the following recipe, > > instead: > > > > 1) Remove as many as possible of the arguments of the recursive function > > 2) Remove as many as possible of the recursive function's local variables > > in scope at the time of the resursion call(s). Block-scope variables > > of blocks without a recursion call in them can stay as they are. > > 3) Replace them by a manually implemented stack of 'status structs', where > > the status struct holds *exactly* those variables that have to be > > preserved across a recursion call. > > 4) Compile the whole stuff with -save-temps -fverbose-asm -O6 and see if > > you can really beat GCC in assembly hacking :-) > > And I would like to point out, again, that all this complicated procedure > and the additional complexity it imposes on the code (meaning more bugs > and more slow-down) might be _totally_ unnecessary! > > There's a popular belief that recursive code is terribly slow, but > experience shows that this is mostly a myth. Recursive code _might_ be > slow, but in many cases it isn't. Because recursive code is usually > smaller, it fits better into the CPU caches. It is also simpler, so you > have less probability for bugs, and it lends itself better to compiler > optimizations. > > In other words, I _strongly_ recommend to write a recursive implementation > in a high-level language such as C or C++, compile it with optimizations, > and _profile_ it, before you decide to go to such lengths. There are good > chances that the code will run fast enough to make any of this completely > unnecessary. When I used progress printfs in my procedure, I clearly won. Probably, since the compiler only reserves registers in sections. With rarely performed printfs, it is a better idea to disreserve registers in the section with the call to printf. But in order to make some memory optimization, I made some other changes as well. But after these changes, there is no difference in speed any more. Maybe, the moves to and from the stack can be paired. I do not optimize to pairability. As you see, I read some information about the Pentium. I also read that the memory access takes very much time. I can not change the number of memory accesses which are all different, but I will try to do about half of the accesses to recent blocks of 32 bytes. I do not know how to align arrays in memory, but I saw in the .s-output that the compiler makes the required alingments itself (.p2align 5). Best regards, Michiel