delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/08/12/22:25:27

Message-ID: <19990812174541.47519@atrey.karlin.mff.cuni.cz>
Date: Thu, 12 Aug 1999 17:45:41 +0200
From: Jan Hubicka <hubicka AT atrey DOT karlin DOT mff DOT cuni DOT cz>
To: pgcc AT delorie DOT com
Subject: Re: optimizing for k6
References: <3 DOT 0 DOT 32 DOT 19990808144013 DOT 0119ad00 AT pop DOT xs4all DOT nl>
Mime-Version: 1.0
X-Mailer: Mutt 0.84
In-Reply-To: <3.0.32.19990808144013.0119ad00@pop.xs4all.nl>; from Vincent Diepeveen on Sun, Aug 08, 1999 at 02:40:16PM +0100
Reply-To: pgcc AT delorie DOT com
X-Mailing-List: pgcc AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com

> There is a very easy way of optimizing for K6,
> just rewrite everything in 8 bits and you're 2 times faster.
> 
Why? The 8 bit arithmetic is issues to X pipe only, so it ought to be 2 times
slower... many 8 bit insns have longer decoding latencies in 8 bit versions.

Honza
> Greetings,
> Vincent
> 
> /At 11:49 AM 8/7/99 +0200, you wrote:
> >Henrik Berglund SdU wrote:
> >> 
> >> ftp://ftp.sinica.edu.tw/pub/doc/cpu/www.amd.com/K6/k6docs/pdf/21828a.pdf
> >> 
> >>
> -----------------------------------------------------------------------------
> >> Henrik DOT Berglund AT mds DOT mdh DOT se
> >> http://www.mds.mdh.se/~adb94hbd/
> >
> >This is a long known document, it does some help in optimizing. But the
> >information is just too incomplete to get really good optimizations.
> >
> >There is also a lot of mistakes in that document. I had a little
> >discussion
> >with AMD technical support, but they did not help :-(
> >AMD Technical Support wrote:
> >> 
> >> >Return-Path: <w DOT formann AT neuss DOT netsurf DOT de>
> >> >Sender: wolfi AT neuss DOT netsurf DOT de
> >> >Date: Fri, 12 Mar 1999 19:10:15 +0100
> >> >From: Wolfgang Formann <w DOT formann AT neuss DOT netsurf DOT de>
> >> >To: AMD Technical Support <blikefet AT pedigree DOT amd DOT com>
> >> >Subject: Re: Some question to your literature, maybe a typo?
> >> >References: <3 DOT 0 DOT 32 DOT 19990303153034 DOT 0074931c AT pedigree DOT amd DOT com>
> >> >
> >> 
> >> Hi,
> >> 
> >> it is the last update of the document. I think you must try it.
> >> 
> >> Kind regards
> >> 
> >> Bernard
> >> 
> >> >AMD Technical Support wrote:
> >> >>
> >> >> >Return-Path: <euro DOT lit AT amd DOT com>
> >> >> >X-Sender: support2 AT pedigree
> >> >> >Date: Thu, 25 Feb 1999 06:39:16 +0100
> >> >> >To: blikefet AT pedigree DOT amd DOT com
> >> >> >From: Wolfgang Formann <w DOT formann AT neuss DOT netsurf DOT de> (by way of CPA
> <euro DOT lit AT amd DOT com>)
> >> >> >Subject: Some question to your literature, maybe a typo?
> >> >> >
> >> >> >I just downloaded the document
> http://www.amd.com/K6/k6docs/pdf/21828a.pdf.
> >> >> >The table in Chaper 4, Pages 37 to 40 says, that all the shift
> operations
> >> >> >like SHIFT mreg16/32,imm8; SHIFT mreg16/32, 1; SHIFT mreg16/32, CL;
> where
> >> >> >SHIFT can be replaced by SAR, SHL/SAL and SHR, are executed as
> RISC86(tm)
> >> >> >Opcode alu. This RISC86(tm) operation is explained on page 24 as
> >> >> >`alu - either of the integer execution units`.
> >> >> >
> >> >> >Whereas in chapter 3 on page 12, this document lists some (all?)
> operations
> >> >> >which can be performed in the Integer Y execution unit. In the list of
> >> >> >operations '(ADD, AND, CMP, OR, SUB and XOR)' there is none of the
> SHIFT's
> >> >> >mentioned.
> >> >> >
> >> >> >By trying it out (I think) I found that chapter 3 is right and the
> table
> >> >> >in chapter 4 has typos.
> >> >> >
> >> >> >My question: Is there any updated version of this document available or
> >> >> >do I have to try out all the other opcodes not listed in chapter 3, but
> >> >> >marked as 'alu' in the table in chapter 4 (like mov, movzx)?
> >> >> >
> >> >> >Thank you
> >> >>
> >> >> Hi,
> >> >>
> >> >> the latest version of the document is on the our webside.
> >> >
> >> >so, it still seems to have different information on the same
> instruction :-(
> >> >
> >> >Is there any additional information available, not shown on your web page?
> >> >
> >> >Thanks again!
> >> >
> >> >>
> >> >> Kind regards
> >> >> Bernard Likefett
> >> >> AMD Technical Support
> >> >
> >> >
> >> Bernard Likefett
> >> AMD Technical Support
> >> 
> >> Please included all previous emails
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> Advanced Micro Devices _______
> >> AMD House \____ | Advanced
> >> Frimley Business Park /| | | Micro
> >> Frimley, Camberley | |___| | Devices
> >> Surrey |____/ \|
> >> GU16 5SL
> >> United Kingdom
> >> 
> >> EMail id euro DOT tech AT amd DOT com Our Web site is http://www.amd.com
> >> Phone +44 (0)1276 803299 Fax +44 (0)1276 803298
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> >Another thing in that manual is the nice table labeled 'Instruction
> >Dispatch and Execution Timing' starting at page 35. Just a few
> >questions:
> >How many internal cycles do all these vector operations take?
> >What internal execution units are used?
> >
> >Well, there is no answer, so you have to try them out. The only thing
> >you can be sure of, is that you should always use opcodes which can get
> >decoded in parallel, these are the ones marked with 'short' since it
> >seems that the bottleneck of that CPU is the decoder.
> >
> >The next thing is the nice tables in the chapter labeled 'Code Sample
> >Analysis'. Did you really understand them? I tried to optimize some
> >real code and took these tables as input, but I failed :-( My processor
> >seems to behave very different. I did not find out what was wrong.
> >So it seems to me, that a lot of information in this document is
> >only for marketing purposes, there are too few details and too many
> >wrong informations to really help to optimize the code.
> >
> >Wolfgang
> >
> >

-- 
                       OK. Lets make a signature file.
+-------------------------------------------------------------------------+
|        Jan Hubicka (Jan Hubi\v{c}ka in TeX) hubicka AT freesoft DOT cz         |
|         Czech free software foundation: http://www.freesoft.cz          |
|AA project - the new way for computer graphics - http://www.ta.jcu.cz/aa |
|  homepage: http://www.paru.cas.cz/~hubicka/, games koules, Xonix, fast  |
|  fractal zoomer XaoS, index of Czech GNU/Linux/UN*X documentation etc.  | 
+-------------------------------------------------------------------------+

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019