Mail Archives: djgpp/1994/08/15/09:44:51
Thanks to everyone who has made suggestions. I've concatenated replies
to Eli Zaretskii and Mike Castle. Mike first:
>If the amount of CPU time grows faster than the amount of I/O
>time, I suspect that the DJGPP version will eventually out
>perform the Borland version.
The processing as it reads the data and builds the network in is O(n),
where n is the size of the input data. It then munges the network to
make it smaller, using various reduction rules, again O(n) [probably],
and then builds a matrix and solves it, which is O(m^2), where m is the
size of the reduced network. However, it only needs to build a matrix
if the network is too complicated to solve using the reduction rules, so
m depends on the intertwinedness of the network more than it depends on
n.
The network reduction and "un-reduction" and the matrix
building/solution are quite FP intensive, but the machine has a
coprocessor, so this shouldn't be affected by the compiler.
>Have you tried profiling (with either version) to find bottle
>necks? How centralizing is your IO? Do you read everything in
>at once, then process, or read/process/write/read/process/write?
I haven't tried profiling the code recently, so it might be worth doing
again. However, this would probably just reduce the times for both
versions.
The data is read in and processed a little (parsed and some simple trig,
then added to the network (malloc a block and add some links). Then the
network reduction is done and (possibly) matrix is built and solved and
the network unreduced. Then the output files are written out.
>If you could work the IO into larger chunks so there are fewer
>RM/PM switches, that might help the djgpp version.
When does the RM/PM switch occur? Presumably it's only when DOS
actually writes to the hardware, so just using setvbuf to increase the
file buffer size would reduce the amount of RM/PM switching. I tried
using setvbuf with a buffer size of 4096 and then 8192 on all input and
output files and got no speed change at all. The line I used was:
if (fh) setvbuf( fh, NULL, _IOFBF, SIZE );
>Another thing to consider is the start up. For DJGPP programs,
>you have a slightly larger overhead than Borland programs would
>(initializing PM, loading the 32-bit code, jumping to it, and so
>on). Did you time with an external program, or an internal
>function?
The timing is done internally so the initialisation time is not
included. I use time() which seems to have approx. 1/20 of a second
resolution, and has the advantage of being ANSI and hence works for
most platforms.
Eli Zaretskii wrote:
> 1) Try using -O3, which adds additional optimisations. GCC
>info file has a list of optimization options which you might also
>consider (I think, loop-unrolling isn't included even in -O3, so
>you might try it, too).
"cc1 -version -O2" and "cc1 -version -O3" give me the same output,
so I think there's no difference. -O3 -funroll-loops makes no
difference to the output timings anyway.
> 2) The runnning time is quite short (5-6 seconds), so the
>overhead of loading go32 and protected mode set-up before your
>program gets control can be significant in relative figures. Are
>these running times characteristic of your typical application?
>If so, what difference does 1 second make? If not, try running for
>longer time, and see if you get better performance *ratio*.
As I said above, the timing is internal and so doesn't include the
start-up time. The running time is typical for a reasonably sized data
set on a fast machine (486DX2 66MHz). With a slower machine and/or a
larger data set, the difference will be more noticable, and I'd like it
to be usable on most machines. I do have some much larger data sets
somewhere - I'll dig them out and do some more tests.
> 3) You didn't mention if you use some software disk cache
>like SmartDrive. Do you? If you do, you can forget about the claims
>that ``DJGPP is a bit weak if you're doing lots of I/O stuff'' (but see
>4 below), because the cache all but eliminates I/O performance problems.
There is no software disk cache running, as the machine has a fairly
good caching disk controller card, so you don't gain anything. (I'm
told this by someone who tried it on a similar machine).
> 4) On a similar machine, 7 seconds is what takes to DJGPP-compiled
>program to read a 4-megabyte file (with SmartDrive installed), so if you
>read multi-megabyte files like this, then yes, you might have I/O
>bottleneck. I think, this is somewhat due to the fact that DJGPP
>read()'s files in chunks of 4KB, which might be sub-optimal when reading
>LARGE files. Otherwise, your problem is *not* I/O.
There are 32 input files read in with a total size of 54316 bytes. 4
output files are produced, total size 158695 bytes. Pretty small
really.
> 5) If your program was originally born for Borland C, then it
>might have some algorithmic problems which are necessary when writing
>16-bit programs. You might consider going through your code and searching
>for such things, which may boost performance with GCC.
It was originally written for Norcroft C running under RISC OS (32 bit
environment), and then ported to Borland C. I can't think of anything
explicitly 16 bit, except a translation table of shorts, which are bit
masks which determines what each input character can be interpreted as.
Olly
- Raw text -