From: Tom St Denis Newsgroups: comp.os.msdos.djgpp Subject: Re: MISSION: Making world´s fastest pixel drawing possible Date: Thu, 18 Jan 2001 12:41:38 GMT Organization: Deja.com Lines: 46 Message-ID: <946oa2$vkp$1@nnrp1.deja.com> References: <942h1e$5qq$1 AT tron DOT sci DOT fi> NNTP-Posting-Host: 24.156.37.224 X-Article-Creation-Date: Thu Jan 18 12:41:38 2001 GMT X-Http-User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 98) Opera 5.01 [en] X-Http-Proxy: 1.1 x66.deja.com:80 (Squid/1.1.22) for client 24.156.37.224 X-MyDeja-Info: XMYDJUIDtomstdenis To: djgpp AT delorie DOT com DJ-Gateway: from newsgroup comp.os.msdos.djgpp Reply-To: djgpp AT delorie DOT com In article , Kovacs Viktor Peter wrote: > > Contents: > > - Fastest possible way to draw pixel=B4s (well, maybe the near pointer ha= > ck > > mentioned in DJGPP FAQ is faster :)). Drawing developed as a macro (becou= > rse > > jumping to function and back again takes more time than drawing the damn > > pixel !). Screen coordinates pre-calculated in the global table. > >=20 > > Feedback needed about the following subjects: > > 1) Is it necessary to save ES segment register (the macro currently does = > not > > do it, but it seems to work fine, no crashes, etc...) ? > > 2) Is the assembly command "les %0,%%edi" faster than the currently us= > ed > > "movw %0,%%es;movl %1,%%edi" pair ? > > Have you heard about the far ptr hack in djgpp? It even allows loading > the segment outside a tight loop... > (It was designed to work with systems that don't allow segment limits > to be set to 4GB.) > =20 > About your questions: > -yes, you should save ES, because someone (memcpy) might use it... > -les: On the P4 it is slower, on other systems it is about the same > -precalculated tables: when you use integer math only, it is faster to > calculate it on the fly, because you can spare a few TLB and cache > entries with it (the overall speed will be faster) > -tables are to be used instead of float math... In general precalculated tables are only good if you have to perform integer division. In my Plush3D library when I calculated Alphablending via a huge table lookup instead of integer arithmetic I sped it up by about 35% despite the 256kb table I had to use. e.g if the operation takes more then 5 or so cycles to calc use a table assuming it is possible. Athlons like tables :-) Tom Sent via Deja.com http://www.deja.com/