From: kagel AT quasar DOT bloomberg DOT com Date: Thu, 20 Feb 1997 11:16:09 -0500 Message-Id: <9702201616.AA04348@quasar.bloomberg.com > To: eliz AT is DOT elta DOT co DOT il Cc: jbennett AT ti DOT com, djgpp AT delorie DOT com In-Reply-To: (message from Eli Zaretskii on Wed, 19 Feb 1997 10:11:53 +0200 (IST)) Subject: Re: Netlib code [was Re: flops...] Reply-To: kagel AT dg1 DOT bloomberg DOT com Errors-To: postmaster AT ns1 Date: Wed, 19 Feb 1997 10:11:53 +0200 (IST) From: Eli Zaretskii X-Sender: eliz AT is Cc: jbennett AT ti DOT com, djgpp AT delorie DOT com Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Content-Length: 942 On Tue, 18 Feb 1997 kagel AT quasar DOT bloomberg DOT com wrote: > say the least. The problem is not with the performance of the Fortran > code but with the memory bandwidth overhead associated with converting > the C row-major matrices to the Fortran column-major order prior to > > What conversion? The FORTRAN is not converting you arrays. FORTRAN and C > share a common calling convention (ignoring the facts that FORTRAN passes > string lengths and always passes pointers). They just disagree on which > dimension to increment first. You are not inverting the arrays are you? Just > declare the C arrays with the indices reversed and everything will be fine. I don't know whether this is or isn't the problem which causes the slow-down, but note that accessing a large array columnwise might hurt performance due to CPU cache trashing and the virtual memory trashing (if the array is large enough to exceed the physical RAM). No, no. You misunderstand what is happening here. FORTRAN is still accessing the actual data in the same order that "C" is. It is just that the sense of the indices is inverted in the source code. In other words, since FORTRAN insists that the row be incremented first FORTRAN compiler writers, knowing the hardware as they must, make the FORTRAN column the same physical dimension as the "C" row so that the memory thrashing you mention does not happen. This means that if for example in FORTRAN you have: INTEGER*4 big_array(100,20) Then in "C" you can declare the same memory as: int big_array[20][100]; And these are identical memory image definitions. Here check this out: t.c: #include int main(void) { long t1[10][20], i, j; test_( t1 ); for (i=0;i<10;i++) { for (j=0;j<20;j++) { printf( "t1[%d][%d]=%d ", i, j, t1[i][j] ); } printf( "\n" ); } printf( "\n\n" ); prt_( t1 ); return 0; } tf.f: subroutine test( t1 ) integer*4 t1(20,10), i, j, k k = 0 do 100 i=1,10 do 90 j=1,20 k = k + 1 t1(j,i) = k 90 continue 100 continue end subroutine prt( t1 ) integer*4 t1(5,10), i, j do 100 i=1,10 do 90 j=1,5 write( *, 200) j, i, t1(j,i) 90 continue 100 continue 200 format( "T1(",i2,",",i2,")=",i4 ) end It all works, it is all efficient and it involves no copying of data! This prints: t1[0][0]=1 t1[0][1]=2 t1[0][2]=3 t1[0][3]=4 t1[0][4]=5 t1[1][0]=6 t1[1][1]=7 t1[1][2]=8 t1[1][3]=9 t1[1][4]=10 t1[2][0]=11 t1[2][1]=12 t1[2][2]=13 t1[2][3]=14 t1[2][4]=15 t1[3][0]=16 t1[3][1]=17 t1[3][2]=18 t1[3][3]=19 t1[3][4]=20 t1[4][0]=21 t1[4][1]=22 t1[4][2]=23 t1[4][3]=24 t1[4][4]=25 t1[5][0]=26 t1[5][1]=27 t1[5][2]=28 t1[5][3]=29 t1[5][4]=30 t1[6][0]=31 t1[6][1]=32 t1[6][2]=33 t1[6][3]=34 t1[6][4]=35 t1[7][0]=36 t1[7][1]=37 t1[7][2]=38 t1[7][3]=39 t1[7][4]=40 t1[8][0]=41 t1[8][1]=42 t1[8][2]=43 t1[8][3]=44 t1[8][4]=45 t1[9][0]=46 t1[9][1]=47 t1[9][2]=48 t1[9][3]=49 t1[9][4]=50 T1( 1, 1)= 1 T1( 2, 1)= 2 T1( 3, 1)= 3 T1( 4, 1)= 4 T1( 5, 1)= 5 T1( 1, 2)= 6 T1( 2, 2)= 7 T1( 3, 2)= 8 T1( 4, 2)= 9 T1( 5, 2)= 10 T1( 1, 3)= 11 T1( 2, 3)= 12 T1( 3, 3)= 13 T1( 4, 3)= 14 T1( 5, 3)= 15 T1( 1, 4)= 16 T1( 2, 4)= 17 T1( 3, 4)= 18 T1( 4, 4)= 19 T1( 5, 4)= 20 T1( 1, 5)= 21 T1( 2, 5)= 22 T1( 3, 5)= 23 T1( 4, 5)= 24 T1( 5, 5)= 25 T1( 1, 6)= 26 T1( 2, 6)= 27 T1( 3, 6)= 28 T1( 4, 6)= 29 T1( 5, 6)= 30 T1( 1, 7)= 31 T1( 2, 7)= 32 T1( 3, 7)= 33 T1( 4, 7)= 34 T1( 5, 7)= 35 T1( 1, 8)= 36 T1( 2, 8)= 37 T1( 3, 8)= 38 T1( 4, 8)= 39 T1( 5, 8)= 40 T1( 1, 9)= 41 T1( 2, 9)= 42 T1( 3, 9)= 43 T1( 4, 9)= 44 T1( 5, 9)= 45 T1( 1,10)= 46 T1( 2,10)= 47 T1( 3,10)= 48 T1( 4,10)= 49 T1( 5,10)= 50 -- Art S. Kagel, kagel AT quasar DOT bloomberg DOT com A proverb is no proverb to you 'till life has illustrated it. -- John Keats