Message-Id: Date: Sun, 10 Aug 97 02:03 MET DST To: janko DOT heilgeist AT gmx DOT net, djgpp AT delorie DOT com References: <01bca43a$76d41440$39a340c2 AT pjheilg DOT goslar DOT netsurf DOT de> Subject: Re: Compiled faster rle sprites in Allegro?! MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT From: Georg DOT Kolling AT t-online DOT de (Georg Kolling) Precedence: bulk Janko Heilgeist schrieb: > Hi, > we tried to optimize our tile based game engine by using compiled sprites > instead of rle sprites. > > Result: 30000 tiles need 21 seconds with compiled sprites (20 seconds with > rle sprites) > > We used the sprite routines draw_rle_sprite and draw_compiled_sprite of > Allegro 2.2 > > Why????????????????????? Easy answer: rle means run length encoding, so if you have 40 pixels of the same color in a row, they're represented as two bytes, a counter and a color byte. In this case, 40 pixels can be written in one chunk, which is much faster than writing every single pixel, like compiled sprites do. Usually, compiled sprites are faster, but if you use only few colors, rle sprites could be faster BTW, I don't believe that pentium processor cache makes rle sprites faster than rle sprites, like someone answered to this question. Why? There are three reasons: 1. A pentium has 8 KB internal code cache that continually reads code from RAM Usually, 4 KB of this code cache have been computed, 4 KB will be computed as long as there are no jumps. Since compiled sprites do not need jumps, cache can't read 'wrong' code. 2. Since there are no jumps, branch prediction can't guess wrong. When you use rle sprites, there are jumps, so branch prediction can waste some cycles 3. a mov instruction with immediate byte value and a memory address needs 4 bytes, ie a 32x32 sprite fits into 4 KB of cache 30000 tiles in 20 seconds? Either you use huge tiles or a slow computer, I guess... nce bit (for virtual memory)...If anyone knows it: please share your knowledge with me... The next byte is a bit ;-) difficult: bits 0-3 are bits 16-19 of the segment size. But now, segment size is 20 bits, that means at most 1048576 bytes (1 MB). bit 7 makes you able to get bigger segments. Set it to 1, and the segment size will represent the number of 4-KB-'pages' in the segment, so the maximal size is 4 KB * 1048576 = 4 GB And finally, the last byte represents bits 24-31 of the base address Very weird, but that's for 286-compatiblity... Hope this helps...