From: "Alex Vinokur" Newsgroups: microsoft.public.win32.programmer.kernel,comp.os.msdos.djgpp,comp.lang.c++,alt.lang.asm Subject: Re: Optimization and operator&& Date: Thu, 6 Jun 2002 17:01:53 +0200 Organization: Scopus Lines: 221 Message-ID: References: <3CFCB642 DOT 252CFFF7 AT bigfoot DOT com> <3CFD46D4 DOT 9070503 AT deadgoths DOT com> <3CFDDE6A DOT 11E99D66 AT bigfoot DOT com> NNTP-Posting-Host: gateway.scopus.net (62.90.123.5) X-Trace: fu-berlin.de 1023371993 624981 62.90.123.5 (16 [79865]) X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com "Bart Crane" wrote in message news:esA$SLMDCHA DOT 1732 AT tkmsftngp02... > Hello, > > The test is not reporting what you expected, because of the optimization. > The quick > reply is to declare your variables as "volatile". This will force the > compiler to actually > access the variables each time through the loops, as opposed to optimizing > the whole > loop out of the code. Analysis follows... > > If I were an optimizing compiler, and I saw that a variable was initialized > to a number, > and then another number was added to it a fixed number of times: > > > static_uint = 123; > > start_time = uclock(); > > for (i = 0; i < TOTAL_ITERATIONS; i++) > > { > > static_uint = static_uint + THE_VALUE; > > } > > end_time = uclock(); > > SHOW_TIME(static, PLUS); > > I would optimize it to: > > static_uint = 123 + (TOTAL_ITERATIONS * THE_VALUE); > start_time = uclock(); > end_time = uclock(); > SHOW_TIME(static, PLUS); > > If I were an optimizing compiler, and I saw a loop that was doing nothing > lots of times: > > > start_time = uclock(); > > for (i = 0; i < TOTAL_ITERATIONS; i++) > > { > > // Do nothing > > } > > end_time = uclock(); > > SHOW_TIME(Do, nothing); > > I would just not generate code for that loop: > > start_time= uclock(); > end_time = uclock(); > SHOW_TIME(Do, nothing); > > The reported times for the Do-Nothing and the ADD test are similar, so this > is what is > probably happening. You should examine the assembly code for the optimized > versions to verify that the TOTAL_ITERATIONS loops are not present. > > The compiler apparently is not smart enough to figure out how to optimize > the AND > case. But I would optimize: > > > static_uint = 123; > > start_time = uclock(); > > for (i = 0; i < TOTAL_ITERATIONS; i++) > > { > > static_uint = static_uint && THE_VALUE; > > } > > end_time = uclock(); > > to: > > static_uint = 123 && THE_VALUE; > start_time = uclock(); > end_time = uclock(); > SHOW_TIME(static, AND); > > (Note: the assignment to dummy1 in the AND test should be removed, or added > to the > PLUS case.) > > The differences between the results reported for static and automatic > variables would be > due to the fact that the automatic variables can be referenced via constant > offsets (-20) > of the pre-loaded register, EBP, whereas the static variable requires a > register load of the > memory address into a register before storing the value. > > To force the compiler to actually perform the AND and PLUS operations, > declare the > variables being operated on as "volatile". Then, the optimizer will force > accesses to the > memory, as opposed to simply pre-calculating the result and skipping the > loop. > > Bart. > > Alex Vinokur wrote in message > news:3CFDDE6A DOT 11E99D66 AT bigfoot DOT com... > > > > > > David Carson wrote: > > > > > Alex Vinokur wrote: > > > > A program below measures performance (time) : > > > > * of operator&& and operator+ > > > > * with automatic and static unsigned int > > > > * with optimizations : No optimization, O1, O2, O3 > > > > > > > > We can see that Optimization causes > > > > an increase in elapsed time for operator&& . > > > > Any explanation? > > > > > > Well, look at the assembly code that gcc is generating for both the > > > optimised and non-optimised case. It's only a single instruction within > a > > > loop, you can figure out the difference given a few minutes and the > > > appropriate manual (all of Intel's are available on their website) even > if > > > you're not an assembly guru. > > > > > > What CPU are you using? Betcha gcc is producing code which would have > been > > > faster on some difference CPU of the Pentium family.. > > > > > > p.s. && does not mean "OR". > > > > Thanks. Of course it is AND. > > > > > > > > > > > Cheers! > > > David... > > > > Here is updated version of program. > > Also relevant information has been added : > > CPU parameters, > > assembly code, > > system description (uclock). > > > > The main conclusion is the same one : > > operator&& is faster with no optimization than with any optimization. > > > > The difference between assembly code for > > optimized and non-optimised cases > > doesn't cause me to be cleverer in this question. > > > > [snip] Hi Bart, Thank you for your interesting analysis. I updated my program. Raw results are in news:misc.test at http://groups.google.com/groups?th=a24437e8ed2f5987 Elapsed time has been computed also when using C/C++ Program Perfometer: |---------------------------------| |-> 1. C/C++ Program Perfometer <-| |---------------------------------| ===================================== http://alexvn.freeservers.com/s1/perfometer.html http://groups.google.com/groups?th=9560b3b3760966fe http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=3512 ===================================== |--------------------| |-> 2. Environment <-| |--------------------| ===================================== Windows-2000 Intel(R) Pentium R(4) CPU 1.70GHz ------------------- gcc version 3.0.4 GNU CPP version 3.0.4 (cpplib) (80386, BSD syntax) GNU C++ version 3.0.4 (djgpp) compiled by GNU C version 3.0.4. Configured with: ../configure i586-pc-msdosdjgpp --prefix=/dev/env/DJDIR --disable-nls Thread model: single ===================================== Main conclusions : |========================================================| | operator&& | No optimization | Optimization O1 | |--------------------|-----------------------------------| | ordinary static | 1203 | 2018 | | volatile static | 549 | 369 | | ordinary automatic | 834 | 1908 | | volatile automatic | 504 | 274 | |========================================================| |========================================================| | operator+ | No optimization | Optimization O1 | |--------------------|-----------------------------------| | ordinary static | 379 | 165 | | volatile static | 384 | 329 | | ordinary automatic | 349 | 179 | | volatile automatic | 379 | 269 | |========================================================| Thanks, Best Regards ==================== Alex Vinokur http://up.to/alexvn http://go.to/alexv_math mailto:alexvn AT bigfoot DOT com mailto:alexvn AT go DOT to ====================