Message-ID: <329E319E.2A82@gbrmpa.gov.au> Date: Fri, 29 Nov 1996 08:43:11 +0800 From: Leath Muller Reply-To: leathm AT gbrmpa DOT gov DOT au Organization: Great Barrier Reef Marine Park Authority MIME-Version: 1.0 To: Glen Miner CC: djgpp AT delorie DOT com Subject: Re: Optimization References: <57hg9b$or5 AT kannews DOT ca DOT newbridge DOT com> <329C95AD DOT C3E AT silo DOT csci DOT unt DOT edu> <57k531$5bu AT kannews DOT ca DOT newbridge DOT com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit > > There shouldnt be any problem running 16 bit data in 32 bit form. > > I don't understand what is your problem here??? How did you get > > the feeling that your program would actually get slower if you use > > 32 bit data instead of 16 bit data. > > Well, my logic is this: You have to move 2x as much data around; this > means your L1 cache fills up 2x as fast. This is not good. Hmmm...I would think that if about 20 people told me one thing, and I was the only person considering the other, that I would be wrong... ;) Anyway, _regardless_ of size of addressed data on the pentium, it always loads in 32 bytes into its cache. If you access a byte, short int or int, the cache will _always_ be filled with 32 bytes of memory. If you are loading 16 bit values, you can get effectively 16 pieces of data into the 32 byte cache line, whereas you only get 8 pieces of 32 bit data. Now, here's the difference: if your accessing 32 bit data, and its in the cache, you get it for free (ie: 0 cycles), if you access 16 bit data, you have a size prefix (1 cycle). So, to access the 16 pieces of 16 bit code, it would take 16 cycles. To access 16 pieces of 32 bit code, it would take 1 cycle. Why? Because a movl from memory takes 1 cycle. If you have 64 bytes of data, you will do one extra cache load, which takes a cycle. (Or maybe 2, its still better than 16!!!) Now, did you follow this ok??? Leathal.