Message-Id: <3.0.32.19990808144013.0119ad00@pop.xs4all.nl> X-Sender: diep AT pop DOT xs4all DOT nl X-Mailer: Windows Eudora Pro Version 3.0 (32) Date: Sun, 08 Aug 1999 14:40:16 +0100 To: pgcc AT delorie DOT com From: Vincent Diepeveen Subject: Re: optimizing for k6 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Reply-To: pgcc AT delorie DOT com There is a very easy way of optimizing for K6, just rewrite everything in 8 bits and you're 2 times faster. Greetings, Vincent /At 11:49 AM 8/7/99 +0200, you wrote: >Henrik Berglund SdU wrote: >> >> ftp://ftp.sinica.edu.tw/pub/doc/cpu/www.amd.com/K6/k6docs/pdf/21828a.pdf >> >> ----------------------------------------------------------------------------- >> Henrik DOT Berglund AT mds DOT mdh DOT se >> http://www.mds.mdh.se/~adb94hbd/ > >This is a long known document, it does some help in optimizing. But the >information is just too incomplete to get really good optimizations. > >There is also a lot of mistakes in that document. I had a little >discussion >with AMD technical support, but they did not help :-( >AMD Technical Support wrote: >> >> >Return-Path: >> >Sender: wolfi AT neuss DOT netsurf DOT de >> >Date: Fri, 12 Mar 1999 19:10:15 +0100 >> >From: Wolfgang Formann >> >To: AMD Technical Support >> >Subject: Re: Some question to your literature, maybe a typo? >> >References: <3 DOT 0 DOT 32 DOT 19990303153034 DOT 0074931c AT pedigree DOT amd DOT com> >> > >> >> Hi, >> >> it is the last update of the document. I think you must try it. >> >> Kind regards >> >> Bernard >> >> >AMD Technical Support wrote: >> >> >> >> >Return-Path: >> >> >X-Sender: support2 AT pedigree >> >> >Date: Thu, 25 Feb 1999 06:39:16 +0100 >> >> >To: blikefet AT pedigree DOT amd DOT com >> >> >From: Wolfgang Formann (by way of CPA ) >> >> >Subject: Some question to your literature, maybe a typo? >> >> > >> >> >I just downloaded the document http://www.amd.com/K6/k6docs/pdf/21828a.pdf. >> >> >The table in Chaper 4, Pages 37 to 40 says, that all the shift operations >> >> >like SHIFT mreg16/32,imm8; SHIFT mreg16/32, 1; SHIFT mreg16/32, CL; where >> >> >SHIFT can be replaced by SAR, SHL/SAL and SHR, are executed as RISC86(tm) >> >> >Opcode alu. This RISC86(tm) operation is explained on page 24 as >> >> >`alu - either of the integer execution units`. >> >> > >> >> >Whereas in chapter 3 on page 12, this document lists some (all?) operations >> >> >which can be performed in the Integer Y execution unit. In the list of >> >> >operations '(ADD, AND, CMP, OR, SUB and XOR)' there is none of the SHIFT's >> >> >mentioned. >> >> > >> >> >By trying it out (I think) I found that chapter 3 is right and the table >> >> >in chapter 4 has typos. >> >> > >> >> >My question: Is there any updated version of this document available or >> >> >do I have to try out all the other opcodes not listed in chapter 3, but >> >> >marked as 'alu' in the table in chapter 4 (like mov, movzx)? >> >> > >> >> >Thank you >> >> >> >> Hi, >> >> >> >> the latest version of the document is on the our webside. >> > >> >so, it still seems to have different information on the same instruction :-( >> > >> >Is there any additional information available, not shown on your web page? >> > >> >Thanks again! >> > >> >> >> >> Kind regards >> >> Bernard Likefett >> >> AMD Technical Support >> > >> > >> Bernard Likefett >> AMD Technical Support >> >> Please included all previous emails >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Advanced Micro Devices _______ >> AMD House \____ | Advanced >> Frimley Business Park /| | | Micro >> Frimley, Camberley | |___| | Devices >> Surrey |____/ \| >> GU16 5SL >> United Kingdom >> >> EMail id euro DOT tech AT amd DOT com Our Web site is http://www.amd.com >> Phone +44 (0)1276 803299 Fax +44 (0)1276 803298 >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > >Another thing in that manual is the nice table labeled 'Instruction >Dispatch and Execution Timing' starting at page 35. Just a few >questions: >How many internal cycles do all these vector operations take? >What internal execution units are used? > >Well, there is no answer, so you have to try them out. The only thing >you can be sure of, is that you should always use opcodes which can get >decoded in parallel, these are the ones marked with 'short' since it >seems that the bottleneck of that CPU is the decoder. > >The next thing is the nice tables in the chapter labeled 'Code Sample >Analysis'. Did you really understand them? I tried to optimize some >real code and took these tables as input, but I failed :-( My processor >seems to behave very different. I did not find out what was wrong. >So it seems to me, that a lot of information in this document is >only for marketing purposes, there are too few details and too many >wrong informations to really help to optimize the code. > >Wolfgang > >