delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/2002/06/06/10:26:28

From: "Alex Vinokur" <alexvn AT bigfoot DOT com>
Newsgroups: microsoft.public.win32.programmer.kernel,comp.os.msdos.djgpp,comp.lang.c++,alt.lang.asm
Subject: Re: Optimization and operator&&
Date: Thu, 6 Jun 2002 17:01:53 +0200
Organization: Scopus
Lines: 221
Message-ID: <adnpsp$j2al$1@ID-79865.news.dfncis.de>
References: <3CFCB642 DOT 252CFFF7 AT bigfoot DOT com> <3CFD46D4 DOT 9070503 AT deadgoths DOT com> <3CFDDE6A DOT 11E99D66 AT bigfoot DOT com> <esA$SLMDCHA DOT 1732 AT tkmsftngp02>
NNTP-Posting-Host: gateway.scopus.net (62.90.123.5)
X-Trace: fu-berlin.de 1023371993 624981 62.90.123.5 (16 [79865])
X-Priority: 3
X-MSMail-Priority: Normal
X-Newsreader: Microsoft Outlook Express 6.00.2600.0000
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
To: djgpp AT delorie DOT com
DJ-Gateway: from newsgroup comp.os.msdos.djgpp
Reply-To: djgpp AT delorie DOT com

"Bart Crane" <bcrane AT iready DOT com> wrote in message news:esA$SLMDCHA DOT 1732 AT tkmsftngp02...
> Hello,
>
> The test is not reporting what you expected, because of the optimization.
> The quick
> reply is to declare your variables as "volatile".  This will force the
> compiler to actually
> access the variables each time through the loops, as opposed to optimizing
> the whole
> loop out of the code.  Analysis follows...
>
> If I were an optimizing compiler, and I saw that a variable was initialized
> to a number,
> and then another number was added to it a fixed number of times:
>
> >     static_uint = 123;
> >     start_time = uclock();
> >     for (i = 0; i < TOTAL_ITERATIONS; i++)
> >     {
> >       static_uint = static_uint + THE_VALUE;
> >     }
> >     end_time = uclock();
> >     SHOW_TIME(static, PLUS);
>
> I would optimize it to:
>
>     static_uint = 123 + (TOTAL_ITERATIONS * THE_VALUE);
>     start_time = uclock();
>     end_time = uclock();
>     SHOW_TIME(static, PLUS);
>
> If I were an optimizing compiler, and I saw a loop that was doing nothing
> lots of times:
>
> >     start_time = uclock();
> >     for (i = 0; i < TOTAL_ITERATIONS; i++)
> >     {
> >       // Do nothing
> >     }
> >     end_time = uclock();
> >     SHOW_TIME(Do, nothing);
>
> I would just not generate code for that loop:
>
>     start_time= uclock();
>     end_time = uclock();
>     SHOW_TIME(Do, nothing);
>
> The reported times for the Do-Nothing and the ADD test are similar, so this
> is what is
> probably happening.  You should examine the assembly code for the optimized
> versions to verify that the TOTAL_ITERATIONS loops are not present.
>
> The compiler apparently is not smart enough to figure out how to optimize
> the AND
> case.  But I would optimize:
>
> >     static_uint = 123;
> >     start_time = uclock();
> >     for (i = 0; i < TOTAL_ITERATIONS; i++)
> >     {
> >       static_uint = static_uint && THE_VALUE;
> >     }
> >     end_time = uclock();
>
> to:
>
>     static_uint = 123 && THE_VALUE;
>     start_time = uclock();
>     end_time = uclock();
>     SHOW_TIME(static, AND);
>
> (Note: the assignment to dummy1 in the AND test should be removed, or added
> to the
> PLUS case.)
>
> The differences between the results reported for static and automatic
> variables would be
> due to the fact that the automatic variables can be referenced via constant
> offsets (-20)
> of the pre-loaded register, EBP, whereas the static variable requires a
> register load of the
> memory address into a register before storing the value.
>
> To force the compiler to actually perform the AND and PLUS operations,
> declare the
> variables being operated on as "volatile".  Then, the optimizer will force
> accesses to the
> memory, as opposed to simply pre-calculating the result and skipping the
> loop.
>
> Bart.
>
> Alex Vinokur <alexvn AT bigfoot DOT com> wrote in message
> news:3CFDDE6A DOT 11E99D66 AT bigfoot DOT com...
> >
> >
> > David Carson wrote:
> >
> > > Alex Vinokur wrote:
> > > > A program below measures performance (time) :
> > > >   * of operator&& and operator+
> > > >   * with automatic and static unsigned int
> > > >   * with optimizations : No optimization, O1, O2, O3
> > > >
> > > > We can see that Optimization causes
> > > >   an increase in elapsed time for operator&& .
> > > > Any explanation?
> > >
> > > Well, look at the assembly code that gcc is generating for both the
> > > optimised and non-optimised case. It's only a single instruction within
> a
> > > loop, you can figure out the difference given a few minutes and the
> > > appropriate manual (all of Intel's are available on their website) even
> if
> > > you're not an assembly guru.
> > >
> > > What CPU are you using? Betcha gcc is producing code which would have
> been
> > > faster on some difference CPU of the Pentium family..
> > >
> > > p.s. && does not mean "OR".
> >
> > Thanks. Of course it is AND.
> >
> > >
> > >
> > > Cheers!
> > > David...
> >
> > Here is updated version of program.
> > Also relevant information has been added :
> >   CPU parameters,
> >   assembly code,
> >   system description (uclock).
> >
> > The main conclusion is the same one :
> >   operator&& is faster with no optimization than with any optimization.
> >
> > The difference between assembly code for
> >   optimized and non-optimised cases
> >   doesn't cause me to be cleverer in this question.
> >
> >

[snip]

Hi Bart,

Thank you for your interesting analysis.

I updated my program. Raw results are in news:misc.test at
http://groups.google.com/groups?th=a24437e8ed2f5987


    Elapsed time has been computed also when using C/C++ Program Perfometer:

 |---------------------------------|
 |-> 1. C/C++ Program Perfometer <-|
 |---------------------------------|
=====================================
  http://alexvn.freeservers.com/s1/perfometer.html
  http://groups.google.com/groups?th=9560b3b3760966fe
  http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=3512
=====================================



 |--------------------|
 |-> 2. Environment <-|
 |--------------------|
=====================================
Windows-2000
  Intel(R) Pentium
  R(4) CPU 1.70GHz
-------------------
gcc version 3.0.4
  GNU CPP version 3.0.4 (cpplib) (80386, BSD syntax)
  GNU C++ version 3.0.4 (djgpp) compiled by GNU C version 3.0.4.
  Configured with: ../configure i586-pc-msdosdjgpp
--prefix=/dev/env/DJDIR --disable-nls
  Thread model: single
=====================================


Main conclusions :

   |========================================================|
   | operator&&         | No optimization | Optimization O1 |
   |--------------------|-----------------------------------|
   | ordinary static    |            1203 |            2018 |
   | volatile static    |             549 |             369 |
   | ordinary automatic |             834 |            1908 |
   | volatile automatic |             504 |             274 |
   |========================================================|


   |========================================================|
   | operator+          | No optimization | Optimization O1 |
   |--------------------|-----------------------------------|
   | ordinary static    |             379 |             165 |
   | volatile static    |             384 |             329 |
   | ordinary automatic |             349 |             179 |
   | volatile automatic |             379 |             269 |
   |========================================================|


Thanks,
Best Regards

====================
  Alex Vinokur
    http://up.to/alexvn
    http://go.to/alexv_math
    mailto:alexvn AT bigfoot DOT com
    mailto:alexvn AT go DOT to
  ====================



- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019