To: Olly Betts Cc: djgpp AT sun DOT soe DOT clarkson DOT edu Subject: Re: Speed tuning programs Date: Wed, 17 Aug 94 12:42:05 +0300 From: eliz AT is DOT elta DOT co DOT il > I haven't tried profiling the code recently, so it might be worth doing > again. However, this would probably just reduce the times for both > versions. Not necessarily true. The libraries and the code generation of the two compilers (BC and GCC) are quite different, so what's a hot spot in one version, doesn't have to be such in another. For example, imagine that some specific library function is much more efficient for one of the compilers, and this very function is used in the innermost loop of your program. GCC should be much more efficient for long int (i.e. 32-bit) arithmetics, and especially for working with large buffers (arrays) where in BC you use far pointers (compact, large or huge memory models). GCC will enable you to make such pointers register variables, whereas BC must access memory (twice) for each reference of these. On the other hand, library functions which move buffers, such as strcpy(), memcpy(), memset(), memmove() are inlined by BC under -O2, which GCC does not. Also, in BC these work by moving 16-bit words, whereas memcpy() which comes with DJGPP moves bytes. If you have such calls, you're better off using movedata() which moves 32-bit double-words. So you see, profiling could indeed tell you something different about each of the versions. You might find that rewriting a single library function as an in-line assembly function is all you need. > There are 32 input files read in with a total size of 54316 bytes. 4 > output files are produced, total size 158695 bytes. Pretty small > really. This means your problem is *not* in the I/O. So I would concentrate on the above issues of code efficiency, for which profiling is the way to start. > There is no software disk cache running, as the machine has a fairly > good caching disk controller card, so you don't gain anything. I would try using software cache anyway. The cache which sits on the controller has a disadvantage of talking to the PC via relatively slow AT bus, whereas software cache typically has about 10 times faster access to system RAM. So, unless you have many megabytes of cache on the controller *and* bus-mastering controller on an EISA or PCI bus, hardware cache will always loose. Apparently, this issue has nothing to do with your run-time problem, but it certainly will improve compilation time. Eli Zaretskii