X-pop3-spooler: POP3MAIL 2.1.0 b 4 980420 -bs- Message-Id: <3.0.32.19980718015819.00994ca0@xs4all.nl> X-Sender: diep AT xs4all DOT nl X-Mailer: Windows Eudora Pro Version 3.0 (32) Date: Sat, 18 Jul 1998 02:00:15 +0100 To: Marc Lehmann From: Vincent Diepeveen Subject: Re: PGCC's optimizations (continued) Cc: beastium-list AT Desk DOT nl Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: Marc Lehmann Status: RO X-Status: A Content-Length: 3887 Lines: 127 At 05:45 PM 7/18/98 +0200, you wrote: >On Sat, Jul 18, 1998 at 07:12:54AM -0400, nuke AT bayside DOT net wrote: >> > >> > Porting from (p)gcc to msvc/watcom is hard because msvc/watcom are horrible >> > in casting. You must cast everything. > >This is quite interesting... are you sure that you compile these programs in >C mode as opposed to mcirosofts combined c-c++ thing? In C, there are many >clearly defined type conversions which need an explicit cast in C++. all c code, and initially you don't notice the casting problems. In fact it usually works ok. But i'm having masses of pointer structures and speedy things. I'll do anything to get faster, as long as i get lossless faster. Most chessprograms engines are therefore in assembler. Mine isn't. Yet i have learnt a lot about C programming. And how well/bad compilers are. It's really though to get something to work faster and faster. So i finally tried to get within Level 2 cache, and converted some rather large datastructures from int to unsigned char. Now this speeded things up considerably especially PII/PRO, but in fact code became also slower in other parts of the story. That's the part where compilers mess up. I have arrays like unsigned char movetables[1..a][1..b],*sq; int u; And i use then in a while loop: sq = movetables[something]; // now the problem is that this sq must be before i use it converted to // an int. while( (u = (int)*sq) != c ) { // now i'm using u in all kind of lookups sq += bla bla; } Now the main problem is this conversion from *sq goes to u. ALL compilers suck in this respect. they all lack some simple optimization rules for lusses and casting. They all either do for example XOR eax,eax and then move the variable in the register: mov al,... ... and then we use in the loop op ,eax ==> partial register stall. Then in the 486 age the only wrong optimization was the extra XORs needed, because you only need to do 1 XOR and OUTSIDE the loop, and not inside the loop. Now in pentium pro/PII we have a major problem, because this causes a partial register stall in most cases (there are some exceptions) too, so i get punished for using 8 bits datastructure, and no way to prevent it. The only way to prevent it is using assembler and also rewriting the variables to 8 bits. The compilers do only 32 bits datamanipulation so they would convert 8 bits code to 32 bits anyway, which causes even more stalls; and that explains too why my program is faster on PRO than K6, and most other chessprograms are way faster on the K6 (8 bits code is way faster decoded to K6 processor too). another main problem is handling the if then else in the while loop If do: int board[64]; while( lus over all pieces ) { .. if( board[sq] == 15 ) { .. .. code1 .. } else if( board[sq] == 2 ) { .. .. code2 .. } .. } So the only 2 references are to the board[sq] in an if then else construction. This means that you can do move eax,[edx] cmp eax,15 jne label .. .. code1 .. jmp below label: cmp eax,2 jne below ... ... code2 ... below: Interesting cases ain't it? These are quite simple. I have bunches of those cases left... ....and also quite some more difficult if you like. If all those things would be in 1 compiler then it would speed my chessprogram up considerably. >> >On Thu, 16 Jul 1998, Vincent Diepeveen wrote: >> Nuke, 1 big note: i'm not interested in HOW long it takes to compile stuff, >> nor did i compare executable size. I'm only interested in how fast >> my program is. >I hope he meant that also! I'm still wondering whether he's in the right mailing list. Vincent Diepeveen