delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/05/06/11:52:05

Message-ID: <336F5041.2D3C@silesia.top.pl>
Date: Tue, 06 May 1997 17:37:37 +0200
From: Michal <wapex AT silesia DOT top DOT pl>
MIME-Version: 1.0
To: djgpp AT delorie DOT com
Subject: Re: Alignmentt
References: <199705052359 DOT JAA19327 AT solwarra DOT gbrmpa DOT gov DOT au>

Leath Muller wrote:

> No, generally all operations be they single or double are extended to
> extended (80 bit) precision where all the math is performed, and then moved
> back to their original precision be it single or double - the main reason
> for this is speed (AFAIK). If you want increased speed and don't require
> the increased precision, you can move the FPU into a lower precision mode.
> 
I tried my inner in single precision - no speed differences. In last
posting you were saying thet the FPU is swiched into single precision
mode, now you're saying into extended. I now its swiched into extended,
but didn't now about any extra time needed for thet. I don't understend
what were you trying to say about the single precision?
> 
> Ouch - run out of registers by any chance?  :)
> 
I use self modyfying code (I don't know if I typed it right), but still
I have
to use 2 memory refernces for upgreading -u- and -v-. The instructions
are pairing so they take 2 clocks. With out doing lightning at the same
time I could use 2 registers more (I use them to store light value and
delta for it) and store delta v and u in there, but all I would get is 1
clock less per pixel, and I would have to execute secund loop wich have
to be slower ofcourse.
> 
> Aaah, your using a z-buffer - an oversight on my behalf... I have yet to
> see an extremely fast z-buffer in software and I don't use a z-buffer; I
> use a AET. (well did, I did this stuff for someone else and now I use O-GL)
> 
Neither do I. 1/z is the divide I need to get right z value every 8
pixels,  I use it to get right u,v and light value, then I calculate new
daltas. You would need a buffer to store 1/z or those deltas (only for
light value, but many of them), if you would calculate them in a first
loop.
> 
> 
> At 16 cycles/pixel, I had no cache misses (effectively) - I had a few points
> where a wasted cycle in the V pipe occured, so I used a dummy movl to load
> the cache line, effectively eliminating the cache load later on... I also
> coded for 32bit modes on the LFB which meant I was using 4 bytes per pixel,
> or 8 pixels per cache line. I looked at using hi-color modes, but they are
> actually slower; its very fast to do a true-colour lightsource lookup using
> the segmented architecture of the x86 registers. I had the light source
> level in the ah register, the actual texel colour in the al, added the start
> of the light source lookup table to eax and moved the corresponding byte.
> Did you follow that? I kinda nearly got lost myself... :)
> 
I use a simmilar look up table by my self but for 8bit color. One think
I don't understand is: what for use 24bit color when using only 65536
colors (your look up table have to be 65536*3 bytes, and you get 3 bytes
of color from one byte of light value and one byte of texel color)? What
do you mean by start of a lookup table, the offset? It's  better to use
a global table for thet with constant offset - you don't have to add
offset, or put the offset right into the mov instructions like I'm doing
with textute offset.
I think (don't know) that loading cashe line is longer then one clock,
so your innsr waits there for the cash to fill.
> 
> 
> Inner speed _is_ everything. 90% of your texture mapping code will be spent
> in your inner most loop... Also, because of the way I used an AET (look in
> Abrash's Zen of Graphics Programming - brilliant resource) I could perform
> all my texture mapping in one pass, move to MMX mode and then post-process
> all the pixels performing the lighting, etc. Extremely effective.
> 
The idea seems very nice. 
By saing that inner speed is not everything I mean in frame render not
one polygon drawing. In a frame with 600 triangles innser speed doesn't
raelly matters. For example I had my affine drawing 9.9 milions pixels
per secund, with self modyfying code now I have 12.6 milions pixels per
secund and don't even noticed the difference in my engine running in
640x480. With perspeciv correct mapping I lose a lot of time by
calculating deltas for the first 8 pixels (I need one division and need
to calculate deltas), it need to be done even for very small scanlines
(maybe you know haw to overcome this without loseing quality?) so the
speed fells dramaticly when number of scanelines increases. With 600
triangles ever 32 pixels wide and long (32*32/2 pixels) I have only half
of a filling rate of 2 triangles with the same size as those 600 all
together.
I don't have access to Abrash's Zen of Graphics Programming, but I've
heard many time about it.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019