Sender: nuetzel AT delorie DOT com Message-ID: <3946ED8D.7EF4134E@myokay.net> Date: Wed, 14 Jun 2000 04:27:25 +0200 From: Dieter =?iso-8859-1?Q?N=FCtzel?= Organization: DN X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.4.0-test1-ac7-cl31 i686) X-Accept-Language: de-DE, de, en-US, en-GB, en MIME-Version: 1.0 To: pgcc AT delorie DOT com CC: Marc Lehmann Subject: Athlon -- has someone the doku handy? Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Reply-To: pgcc AT delorie DOT com Hello, the are on (for Athlon stepping 1) and two (for Athlon stepping 2) flag names missing in the current linux kernel (2.4.0-test1-ac18). Alan Cox and I are very pleased if someone of you have the AMD Athlon Programming Ref handy?! Took some time to get the CD from AMD... processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 1 model name : AMD-K7(tm) Processor stepping : 2 cpu MHz : 548.952604 cache size : 512 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov 16 mmxext mmx 3dnowext 3dnow bogomips : 1094.45 16 ? Now Athlon 800 (nice thing:-) processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 2 model name : AMD Athlon(tm) Processor stepping : 1 cpu MHz : 798.470512 cache size : 512 KB fdiv_bug : no hlt_bug : no sep_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov 16 pse36 mmxext mmx 24 3dnowext 3dnow bogomips : 1592.52 16 & 24 ? Thanks, Dieter BTW I found the best optimization flags combination for the Athlon. GCC-2.96 CVS didn't come close neither with same flags or mcpu=athlon and/or march=athlon!!! :-( Why? Especially -O (nothing), -mcpu=k6 and -mpreferred-stack-boundary=2 (2 !!!) is needed. !!!-fomit-frame-pointer is worse!!! Don't use it as you can... This is the best for an MFLOPS test (dgemm from Quant-X, Alpha FPU test, source available). -O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4 -fschedule-insns2 -fexpensive-optimizations K7-550 gcc -O -funroll-loops -DMAIN -o dgemm dgemm.c SunWave1>./dgemm-O m:1000 n:1000 k:1000 Ail_max 24, Blj_max 12, A_row_block 85 Shimizu's DGEMM : 147.493 MFLOPS(13.560 sec) Shimizu's DGEMM : 147.493 MFLOPS(13.560 sec) Shimizu's DGEMM : 147.601 MFLOPS(13.550 sec) gcc -O -mcpu=k6 -mpreferred-stack-boundary=2 -malign-functions=4 -fschedule-insns2 -fexpensive-optimizations -DMAIN -o dgemm dgemm.c SunWave1>./dgemm-k6 m:1000 n:1000 k:1000 Ail_max 24, Blj_max 12, A_row_block 85 Shimizu's DGEMM : 213.447 MFLOPS( 9.370 sec) Shimizu's DGEMM : 213.220 MFLOPS( 9.380 sec) Shimizu's DGEMM : 213.220 MFLOPS( 9.380 sec) K7-800 got ~222 and ~288 Any questions? -- Dieter Nützel Graduate Student, Computer Science University of Hamburg Department of Computer Science Cognitive Systems Group Vogt-Kölln-Straße 30 D-22527 Hamburg, Germany email: nuetzel AT kogs DOT informatik DOT uni-hamburg DOT de @home: dieter DOT nuetzel AT myokay DOT net