delorie.com/archives/browse.cgi   search  
Mail Archives: pgcc/1999/08/12/02:50:34

Message-Id: <3.0.32.19990808144013.0119ad00@pop.xs4all.nl>
X-Sender: diep AT pop DOT xs4all DOT nl
X-Mailer: Windows Eudora Pro Version 3.0 (32)
Date: Sun, 08 Aug 1999 14:40:16 +0100
To: pgcc AT delorie DOT com
From: Vincent Diepeveen <diep AT xs4all DOT nl>
Subject: Re: optimizing for k6
Mime-Version: 1.0
Reply-To: pgcc AT delorie DOT com

There is a very easy way of optimizing for K6,
just rewrite everything in 8 bits and you're 2 times faster.

Greetings,
Vincent

/At 11:49 AM 8/7/99 +0200, you wrote:
>Henrik Berglund SdU wrote:
>> 
>> ftp://ftp.sinica.edu.tw/pub/doc/cpu/www.amd.com/K6/k6docs/pdf/21828a.pdf
>> 
>>
-----------------------------------------------------------------------------
>> Henrik DOT Berglund AT mds DOT mdh DOT se
>> http://www.mds.mdh.se/~adb94hbd/
>
>This is a long known document, it does some help in optimizing. But the
>information is just too incomplete to get really good optimizations.
>
>There is also a lot of mistakes in that document. I had a little
>discussion
>with AMD technical support, but they did not help :-(
>AMD Technical Support wrote:
>> 
>> >Return-Path: <w DOT formann AT neuss DOT netsurf DOT de>
>> >Sender: wolfi AT neuss DOT netsurf DOT de
>> >Date: Fri, 12 Mar 1999 19:10:15 +0100
>> >From: Wolfgang Formann <w DOT formann AT neuss DOT netsurf DOT de>
>> >To: AMD Technical Support <blikefet AT pedigree DOT amd DOT com>
>> >Subject: Re: Some question to your literature, maybe a typo?
>> >References: <3 DOT 0 DOT 32 DOT 19990303153034 DOT 0074931c AT pedigree DOT amd DOT com>
>> >
>> 
>> Hi,
>> 
>> it is the last update of the document. I think you must try it.
>> 
>> Kind regards
>> 
>> Bernard
>> 
>> >AMD Technical Support wrote:
>> >>
>> >> >Return-Path: <euro DOT lit AT amd DOT com>
>> >> >X-Sender: support2 AT pedigree
>> >> >Date: Thu, 25 Feb 1999 06:39:16 +0100
>> >> >To: blikefet AT pedigree DOT amd DOT com
>> >> >From: Wolfgang Formann <w DOT formann AT neuss DOT netsurf DOT de> (by way of CPA
<euro DOT lit AT amd DOT com>)
>> >> >Subject: Some question to your literature, maybe a typo?
>> >> >
>> >> >I just downloaded the document
http://www.amd.com/K6/k6docs/pdf/21828a.pdf.
>> >> >The table in Chaper 4, Pages 37 to 40 says, that all the shift
operations
>> >> >like SHIFT mreg16/32,imm8; SHIFT mreg16/32, 1; SHIFT mreg16/32, CL;
where
>> >> >SHIFT can be replaced by SAR, SHL/SAL and SHR, are executed as
RISC86(tm)
>> >> >Opcode alu. This RISC86(tm) operation is explained on page 24 as
>> >> >`alu - either of the integer execution units`.
>> >> >
>> >> >Whereas in chapter 3 on page 12, this document lists some (all?)
operations
>> >> >which can be performed in the Integer Y execution unit. In the list of
>> >> >operations '(ADD, AND, CMP, OR, SUB and XOR)' there is none of the
SHIFT's
>> >> >mentioned.
>> >> >
>> >> >By trying it out (I think) I found that chapter 3 is right and the
table
>> >> >in chapter 4 has typos.
>> >> >
>> >> >My question: Is there any updated version of this document available or
>> >> >do I have to try out all the other opcodes not listed in chapter 3, but
>> >> >marked as 'alu' in the table in chapter 4 (like mov, movzx)?
>> >> >
>> >> >Thank you
>> >>
>> >> Hi,
>> >>
>> >> the latest version of the document is on the our webside.
>> >
>> >so, it still seems to have different information on the same
instruction :-(
>> >
>> >Is there any additional information available, not shown on your web page?
>> >
>> >Thanks again!
>> >
>> >>
>> >> Kind regards
>> >> Bernard Likefett
>> >> AMD Technical Support
>> >
>> >
>> Bernard Likefett
>> AMD Technical Support
>> 
>> Please included all previous emails
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Advanced Micro Devices _______
>> AMD House \____ | Advanced
>> Frimley Business Park /| | | Micro
>> Frimley, Camberley | |___| | Devices
>> Surrey |____/ \|
>> GU16 5SL
>> United Kingdom
>> 
>> EMail id euro DOT tech AT amd DOT com Our Web site is http://www.amd.com
>> Phone +44 (0)1276 803299 Fax +44 (0)1276 803298
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
>Another thing in that manual is the nice table labeled 'Instruction
>Dispatch and Execution Timing' starting at page 35. Just a few
>questions:
>How many internal cycles do all these vector operations take?
>What internal execution units are used?
>
>Well, there is no answer, so you have to try them out. The only thing
>you can be sure of, is that you should always use opcodes which can get
>decoded in parallel, these are the ones marked with 'short' since it
>seems that the bottleneck of that CPU is the decoder.
>
>The next thing is the nice tables in the chapter labeled 'Code Sample
>Analysis'. Did you really understand them? I tried to optimize some
>real code and took these tables as input, but I failed :-( My processor
>seems to behave very different. I did not find out what was wrong.
>So it seems to me, that a lot of information in this document is
>only for marketing purposes, there are too few details and too many
>wrong informations to really help to optimize the code.
>
>Wolfgang
>
>

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019