To: djgpp AT sun DOT soe DOT clarkson DOT edu, eliz AT is DOT elta DOT co DOT il Subject: Re: DJGPP Speed Date: Tue, 29 Mar 1994 23:38:37 -0800 From: Darryl Okahata > This summarizes some testing I did for the DJ GCC 1.11maint4 as far as > the compilation speed goes. [ ... ] > Conclusion 2: With a fairly good disk cache, the I/O is probably NOT the > decisive factor (at least for the above-mentioned machines). Here's another data point: System: 60MHz Pentium w/512K L2 cache & 16MB RAM, Adaptec 2842 VLB SCSI controller, all files on a Fujitsu 2624FA disk in SCSI-2 mode (this is an "average-performing" disk by today's standards: 11ms access, 1700K/sec transfer), DOS 6.2, 2MB smartdrv, QEMM 7.03, QDPMI *disabled* File: makeinfo.c from the TeXinfo 3.1 distribution (DGJPP: gcc -E makeinfo.c | wc == 8766 lines) Times for "gcc -c -O2 -DREMOVE_OUTPUT_EXTENSIONS makeinfo.c": 21.03 sec (~415 lines/sec) Temp files on 2MB ramdisk. All other files on HD. 25.27 sec (~345 lines/sec) Temp files on HD, smartdrv write cache enabled. 32.07 sec (~270 lines/sec) Temp files on HD, smartdrv write cache disabled. [ Yeah, I know that I didn't specify -DREMOVE_OUTPUT_EXTENSIONS when I counted the number of lines. This only affects about 9 or so lines, and so it really doesn't matter. Besides, I'm lazy. ;-) ] What this really shows is how important a ramdisk is for tmp files. With MSDOS/Windows, there is no such thing as a good disk cache. Why? Because accessing the cache requires expensive real to protected mode switches, and vice-versa. On my PC, it takes about 21us to go from real to protected, and about 19us to go from protected to real, for a total of around 40us. At 60MHz, this is 2400 cycles (average of 1200 cycles/switch). This is a *LOT* of wasted cycles -- cycles which I believe aren't wasted with other operating systems (does anyone know how expensive call gate transitions are?). It gets uglier with 32-bit protected mode code (you know, the stuff that DJGPP generates ;-). To do a disk read, you have to do the following expensive mode switches: 1. One switch to go from DJGPP code (protected mode) to the (real mode) disk I/O code in GO32. 2. Two or more switches (real to protected and back again) to check the disk cache to see if the data is not in the cache. (Hmm. Maybe this table is stored in real-mode RAM? If so, you wouldn't need these switches if you know the data is not in the cache.) 3. If the data is not in the cache, we read in data from the disk, and do two more switches (at least) to put the just-read data into the cache. 4. Finally, we do one more switch to get back to the DJGPP code. At a minimum, this is 4-6 switches. At an average of 20us/switch, this wastes a LOT of CPU cycles. Making things worse is that the GO32 transfer buffer size is 4K; if we want to read a chunk of data larger than this, we get to iterate in chunks of 4-6 switches. Yippee. 1/2 ;-) I'm not complaining about DJGPP, mind you, but I'd like to point out where some of the real inefficiencies are. With real operating systems, you don't have this spaghetti of time-consuming mode switches to read data. -- Darryl Okahata Internet: darrylo AT sr DOT hp DOT com DISCLAIMER: this message is the author's personal opinion and does not constitute the support, opinion or policy of Hewlett-Packard or of the little green men that have been following him all day.