delorie.com/archives/browse.cgi   search  
Mail Archives: djgpp/1997/05/09/08:25:29

Message-ID: <33731766.92F@silesia.top.pl>
Date: Fri, 09 May 1997 14:24:06 +0200
From: Michal <wapex AT silesia DOT top DOT pl>
MIME-Version: 1.0
To: djgpp AT delorie DOT com
Subject: Re: Alignment
References: <199705080655 DOT QAA17483 AT solwarra DOT gbrmpa DOT gov DOT au>

Leath Muller wrote:
> 
> *EWWWW* Self modifing code on a PPro is an absolute no-no... your code will
> probably die in a major way on a PPro...
> 
I modyfy the code once per triangle. It rules; my affine is 9 clocks
(can be done 8, but I would have to lost some features and with a large
number of triangles it would have been slower) it uses 64x64 textures,
can repeat them in triangle, uses look up table for lightning, texture
can have any offset, uses 6:26 fixed point for texture u and v, and 8:24
fixed point for lightning. 
> 
> What is your code based on? Heckers?
> 
What does -Heckers- means?
> 
> You lost me a bit there... what are you talking about a constant offset?
> I use one table which has lighting information based on the source value. In
> true colour, each R, G and B light component can be an 8 bit value. Each
> source texel RGB component value is an 8 bit value. Combinine the two you get a
> 16bit value which is the result of the lit texel in the right colour... and
> its automatically calculated with the segmented registers... in other words,
> I only need one 65536 (16 bit) lookup table because I can use the same table
> for all components...
> 
You have to look up that table 3 times per pixel. Is your lightning
perspectiv correct? I still don't undertand why doing lightning in
second loop. Doing secund loop, and writing to screen fot the second
time. More registers would have not recompensed it (at least in my
case).
> It _can_ be more than one clock on slower Pentium class machines (generally
> less than 133's) which is what is still the major market share at the moment.
> 
Are you preloading cashe line once per pixel? If not, it doesn't make
any sense; when 
your u delta (y in texture) is larger than 1/8 you're going to skipp
lines and have cashe
misses anyway.If your v delta(x in texture) is larger than 4 (skipp 3
texels every pixel) you're
ending with cashe misses too(dv>4 => dv*8>32>cashe line size). The same
with lightning lookup table.
This is for 8bit color.
> 
> Question: Are you saying you calculate the delta's every time you render
> a scan line? Anyway, 
I calculate deltas for first 8 pixels every scanline. I need for this 2
fdivs. First to get first u,v & light value, secund to get them after 8
pixels. I thought even of interpolating deltas and first u,v & light
value linnearly every 8 scanelines, but it would make a reall mess in my
procedure. That is
the real problem in my procedure, in a real frame render this makes a
dramatic slown down.

> if your doing 8 pixel scanlines (complete) just map
> the entire line affinely (if I understand what your saying correctly... :)
I'm doing it (ofcoure), but I still need deltas for then 8 pixels.
> The way I did it, I simply started the first fdiv which took 19 cycles,
> did some integer stuff, did the second fdiv (another 19) with more integer
> stuff in parallel, and mapped them affinely getting about 6 cycles per pixel.
> Can you elaborate more on this?
> 
I need 2 fdives, 6 fmuls, and other stuff even if the scanline is 2
pixels width. Ofcourse I could write a special case code for that, but
in (for example) 4 pixels long canelines would look incorrect, and most
of my scanelines is larger then 8 pixels.
I can't do integer stuff in parallel with first fdiv (second too)
becouse I need its calculation to go further.

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019