Mail Archives: djgpp-workers/1999/07/08/16:52:30
Eli Zaretskii wrote:
> > Finally, I did one more observation: The error disappears when
> > I turn off "Internal Cache" in BIOS setup, but then the compilation
> > will take 5 minutes instead of 5 seconds...
> Last time I saw strange problems that disappeared when the cache was
> turned off, it was a case of a bad motherboard: the system clock was
> driving the memory chips too fast, and some of the SIMMs were
> sometimes not keeping up. Interestingly enough, the problem was
> detected by GCC (crashes during compilation), although I think it was
> on plain DOS, not in Windows.
Yes, I wouldn't rule out hardware problems completely yet, but my pc
(PP200 (SY013), ASUS P/I-P6RP4, 128 Mb EDO, Orion 450KX PCIset,
16 kb L1-cache, 256 kb L2-cache, pc from 1996) uses to work very
well, especially in plain DOS. I've also tried to change to some old
parity SIMMs or "shift" the EDOs one step down, but the error didn't
go away.
However, I did some interesting observations when I started to examine
the contents of the physical RAM. First I made a little program which maps
all physical memory, and then searches for a specified 32-bit pattern.
The addresses below are all physical addresses. Not all occurences
of the data are shown here, only most important.
1) Virtual Memory is enabled:
a) After a successful run:
address 0x127e004: 0x243d5450
address 0x138a004: 0x00292fec (right data)
b) After crash (triggered by 3-program previously run):
address 0x127e004: 0x243d5450
address 0x1388e10: 0x243d5450
address 0x138b004: 0x243d5450 (wrong data)
2) Virtual Memory is enabled,
but _CRT0_FLAG_LOCK_MEMORY is used in CC1:
a) After a successful run:
address 0x1342004: 0x243d5450
address 0x138b004: 0x243d5450 (right data)
address 0x14a2004: 0x00292fec (right data)
b) Crash could not be triggered!
Here is my new theory:
For a successful run, what we would expect on physical address
0x138a004 (or 0x138b004) is a valid lp->limit value (0x00292fec),
which is the case in 1a) and 2a).
In case 1b), however, someone has accidently written 0x243d5450
("PT=$") on address 0x138b004. This happens inside morecore()
in malloc.c (see previous mail). The lp-limit value (0x00292fec)
has disappeared, so we will get a crash.
Now look at case 2a)! Here we will find both values (0x243d5450
and 0x00292fec) but on _different_ addresses, and all goes well!
What I think is: There is nothing wrong with malloc.c or morecore()
or sbrk() or the whole CC1 for that matter. But Windows 3.11 DPMI
somehow mixes up pages in some cases when Virtual Memory is enabled.
Probably what we see is a "double-use" of a physical page at
0x0138b004, which could give all sorts of really strange results.
It's fully possible to investigate this further, these are just my first
impressions. Meanwhile, I'm going to use _CRT0_FLAG_LOCK_MEMORY
in CC1 as a first work-around, and see if I get any crashes.
So far, I've done a hundred compilations with different infiles
and it seems to work. Also, there is probably nothing wrong
with high addresses (>2Gb), as I first thought.
--
Erik
- Raw text -