delorie.com/djgpp/doc/eli-m17n99.html | search |
Note from DJ: This is a copy of a presentation Eli Zaretskii gave during a trip to Japan.
This file describes the DJGPP project, its goals, current status, and future perspectives.
DJGPP, an acronym for DJ's GNU Programming Platform1, is a project which brings the GNU development tools to MS-DOS and MS-Windows systems. Its originator and principal maintainer is DJ Delorie; that's where the "DJ" in DJGPP comes from.
DJGPP is about Free Software. The ported GNU packages are, of course, free; however, the library and utilities developed specifically for DJGPP are also distributed under the GNU license. Since DJGPP supports platforms which have such a huge installed base, and since it is highly popular among MS-DOS/MS-Windows users, the project is a very important member of the Free Software movement. Significantly, a large proportion of DJGPP users are young programmers at the very beginning of their careers. Teaching those young people about the importance of free software and free sharing of ideas is in itself a worthy goal. DJGPP is in a unique position to perform this important community service because it usually is the first serious compiler used by young programmers.
But DJGPP is also about fun. It is fun to port industry-strength applications to MS-DOS and have them running seamlessly on top of a 16-bit "toy operating system". It is fun to see how these applications change the way your system looks and feels, to a point that you can almost think it is a Unix box. It is fun to have all the source code, down to the darkest corners of the library internals, free for reading and hacking. It is fun to be able to find and fix bugs no matter whether they occurred in the application code, in the library, or in the compiler. And it is fun to discuss all these matters with other users and developers all over the world, and to join forces to make the free software better and more powerful. More about this later.
This article presents an overview of the DJGPP project. Section 1 briefly tells the history of the project development. Section 2 explains how protected-mode DJGPP programs manage to run on top of MS-DOS even though MS-DOS and protected mode are incompatible. Section 3 describes several important features that DJGPP brings to MS-DOS and MS-Windows. Internationalization (a.k.a. I18N) aspects specific to DJGPP are discussed in Section 4. Finally, Section 5 summarizes the achievements of 10 years of DJGPP development, and attempts to predict its future.
"In the beginning was the Word...", says the Bible.
Like every other human endeavor, DJGPP also started with a Word. And like it happens with almost everything else in the free software world, that word belonged to Richard Stallman. Here's how DJ Delorie himself describes the genesis of DJGPP2:
DJGPP was born around 1989 [...], when Richard Stallman spoke at a meeting of the Northern New England Unix Users Group (NNEUUG) at Data General, where I then worked. I asked if the FSF ever planned on porting gcc to MS-DOS [...], and he said it couldn't be done because gcc was too big and MS-DOS was a 16-bit operating system. Challenge in hand, I began.
Consequently, we should consider Richard Stallman a progenitor of DJGPP, or at least its godfather. Had it not be for his scepticism, it's possible that DJGPP would not have existed....
The first version of GCC ported by DJ was 1.35. It was compiled on a
386 machine running ISC Unix, linked with a hacked libc.a
taken
from that machine which had DOS-compatible replacements for system calls
such as open
, read
, stat
, etc. and converted to a
DOS executable format with a custom program written by DJ: a first
version of DJGPP, originally called djgcc, was born. It
required Phar Lap's DOS Extender to run protected-mode code on top
of real-mode DOS. See DJGPP Programs and MS-DOS, for more about DOS extender's role.
To compile itself, gcc needs lots of memory, which PCs didn't have at
that time. Since the DOS extender used to run gcc didn't support
virtual memory, DJ wrote his own DOS extender called go32
. GCC
version 1.37 was the first version built on a DOS platform using
go32
.
Next came the library. The first version was based on the BSD library whose sources were free'd at that time, and augmented with many custom DOS-specific functions that interfaced with the OS. The header files were based on those distributed with gcc.
The name was changed from DJGCC to DJGPP when C++ support
was added. Initially, the name stood for DJ's G++, with the
+
characters replaced by p
s because DOS doesn't allow
+
in file names. However, since the C++ compiler is integral
to gcc distribution, DJGPP now probably stands for something like
DJ's GNU Programming Platform3.
DJGPP version 1.05 was the first one available commercially, and it was a big success. Version 1.11 supported all DOS configurations, had somewhat limited support for running on MS-Windows (e.g., graphics and floating-point emulation didn't work), and appeared on the GNU Compiler Binaries CD-ROM.
This was in 1992, and around that time I myself began using DJGPP.
I have just bought a brand-new 486-DX/33 and got an email account, and
DJGPP v1.11m5 was the first version I downloaded and installed.
Exposed to Unix-style tools by the excellent book Software Tools
by B.W. Kernighan and P.J. Plauger, I was for years porting GNU
programs to MS-DOS using 16-bit proprietary compilers. I became tired
of dealing with missing headers, like <unistd.h>
, missing
functions, like popen
and alloca
, and missing
functionality, like long command lines in a Makefile
. DJGPP
solved all these and many other problems, and I became instantly hooked.
By the end of 1994 DJGPP became so popular and the traffic on its mailing list became so intensive, that a FAQ list was sorely needed; the first version of the FAQ was released in February 1995. Today, in its 7th edition, the DJGPP FAQ list includes answers to 200 questions, its Texinfo source totals 540K bytes, and its printed version is more than 200 pages long.
DJGPP v1.x could not bootstrap itself: it required Borland's
compiler to build the go32
extender. Cygnus, a big user of
DJGPP for their DOS-based products, requested a self-bootstrapping
version, so DJGPP v2 was born. Version 2 moves some parts of
go32
into the C library, other parts into a stub loader produced
by a special-purpose assembler capable of producing 16-bit code, and it
relies on DPMI services to run on top of DOS; more about this in
the next section.
Meantime, in response to the growing interest and user base, a news
group dedicated to DJGPP, <comp.os.msdos.djgpp
>, was created in
June 1995. Nowadays, the traffic on the news group averages about 70
messages per day.
Version 2.0 of DJGPP was shipped in February 1996, after more than two years of development and testing. The v2 library is Posix-compliant, the only library that offers Posix compliance on MS-DOS, and one of the two available for MS-Windows. It also introduced transparent and automatic support of long file names on Windows 9X.
Version 2.01 was released in October 1996. The GNU Software for MS-Windows and MS-DOS CD-ROM, based on DJGPP v2.01 ports of many GNU packages, was released in the last quarter of 1998, and its first edition out-sold all other GNU CD-ROMs.
The latest version 2.02 of DJGPP was released in December 1998.
GCC generates 32-bit code, so DJGPP programs are 32-bit programs. GCC also doesn't know anything about segmented architecture of the x86 processors, so its code effectively enforces the data, stack and code segments to be constant during the program execution. However, real-mode segments of x86 CPUs are only 64KB-long. Therefore, to be able to compile large programs, like GCC itself, DJGPP must run in protected mode. This section describes the tricks pulled by DJGPP to make this possible.
Switching the CPU into protected mode is easy, but you cannot call DOS and BIOS services while the CPU is in protected mode. Why? Because DOS and BIOS code was written for execution in real mode, and so it constantly violates the rules of protected-mode programming. For example, DOS code loads many different values into segment registers, to overcome the 64KB limitation of a real-mode segment. But in protected mode, a segment register can only be loaded with a value that corresponds to one of the existing selectors; any other value causes a General Protection Fault (GPF in short).
So, if a program switches the CPU into protected mode and then calls DOS, e.g. to print a message, it will immediately crash the system. You can't write even the simplest Hello World program without hitting this brick wall!
It gets worse. DOS and BIOS code needs to be run even if the application program doesn't call any of their services. For example, 18 times a second there's a timer tick, a hardware interrupt issued by the timer chip that's supposed to advance the system clock. But the handler for the timer tick interrupt is part of BIOS, and it employs real-mode code.
So even if a program does nothing to call any real-mode code, some asynchronous system events will do that anyway, and the machine will still crash very promptly. Can the conflict between DOS/BIOS and the protected mode be solved? Yes; read on.
The solution to this conflict, if you don't want to write a protected-mode operating system which replaces DOS and BIOS completely4, is to add a layer of software between your program and DOS/BIOS code that would switch the CPU from protected to real mode and back, as appropriate. This software layer is called DOS extender.
With a DOS extender, when a protected-mode program calls a real-mode service, the extender traps the call, switches the CPU to real mode, reissues the call, waits for the service to do its thing, then switches the CPU back into protected mode, and returns to the application code that called the real-mode service. Hardware interrupts, such as the timer tick and the keyboard interrupt, are also trapped by the extender, and also cause a switch to real mode and back.
You might think that these mode switches would considerably slow down the application. However, in practice, most programs don't call the OS services too often, and even when they do, the peripheral devices accessed by most of these services, such as the hard disk, are so much slower than modern CPUs, that the overhead of the mode switch is hardly ever noticed.
go32
ExtenderIn DJGPP v1.x, go32
was such a DOS extender. It was loaded
automatically by every program during its startup. In addition to the
usual functions performed by DOS extenders, it also handled some unique
DJGPP-related tasks:
Since DJGPP executables use COFF format, which DOS doesn't
understand, go32
was responsible to read the COFF header and set
up the code, data, and other segments as recorded in the header.
This is required to overcome deficiencies in stock DOS shells which
prevent even the simple task of compiling GCC without extensive hacking
of its Makefile
s. See Features Provided by DJGPP, for more details about this.
FP emulation needs special handling in protected mode. DJGPP
supplies an FP emulator which go32
would load and set up.
To facilitate graphics programs, go32
allowed to load a driver
suitable for the installed video hardware, and worked with the VGA bank
switching features to create an illusion of a linear video memory.
Using an extender had an important advantage of being able to run on any
DOS configuration, since go32
had special code to adapt itself to
all known methods of switching into protected mode and managing extended
memory. But it did have a significant drawback as well: the extender
was loaded into conventional memory and each instance used about 130KB
of that memory. Since most DOS systems had about 500 to 600 KBytes of
free conventional memory, this means you couldn't have more than 3-4
nested levels of DJGPP programs. This was a grave limitation: for
example, you couldn't build programs whose Makefile
s required
more than 2 recursive levels of make
invocation (because GCC and
the compiler passes it invokes require 2 additional levels of program
nesting). DJGPP v2 solves this problem, as described below.
DJGPP v2.x gets rid of the extender, and instead requires DPMI
services to run. DPMI, an acronym for DOS Protected-Mode
Interface, is a special API that allows protected-mode programs to run
on top of DOS. It defines several functions that a protected-mode
program (called a DPMI client) can use to perform such tasks as
entering protected mode, allocating memory and segment descriptors,
calling real-mode services, hooking interrupts, etc. Many modern
operating systems for Intel CPUs include the DPMI services; all versions
of MS-Windows, OS/2, and Linux DOS emulator are notable examples. There
are also several proprietary DPMI servers for DOS, usually bundled with
DOS memory managers such as QEMM and 386MAX; and
FreeDOS includes a DPMI server as part of the default setup. For those
systems which don't have a DPMI server, DJGPP v2.x comes with a
free server called CWSDPMI; not surprisingly, CWSDPMI reuses a
lot of code from go32
. The DJGPP startup code checks for
DPMI services, and if they aren't available, automatically looks for and
loads cwsdpmi.exe
, the CWSDPMI server.
The DPMI server (a.k.a. the DPMI host) solves most of the
problems of running a protected-mode program on top of real-mode DOS.
The rest of the functionality, which in v1.x was the responsibility of
go32
, is handled in v2.x by the DJGPP startup code and
low-level library functions. Let me now briefly describe these two
aspects of DJGPP operation.
The DJGPP v2.x startup code includes two parts: the stub loader and
the library startup module. The former is a single assembly-language
module which is compiled by a special-purpose assembler, called
djasm
, that is capable of producing 16-bit DOS executables. This
stub loader is prepended to every DJGPP program during linking, and
is the only part that DOS understands; all the rest--the COFF
executable--is just some weird data, as far as DOS is concerned.
The second part of the startup is in the library. It consists of several modules written part in C and part in assembly. Here's where the COFF image entry point is, and that is where the stub passes the execution after it loads the program and sets it up.
Here's the short description of what the stub does:
This buffer is required for passing data to and from real-mode services. Its role is described in Library Interface with DOS and BIOS, below.
DPMI services would already be available if either (1) a resident DPMI server, such as the one built into MS-Windows, is installed; or (2) if this is a nested DJGPP program, and its parent already loaded CWSDPMI.
If DPMI services are not available, the stub loads
cwsdpmi.exe
5. It looks for
cwsdpmi.exe
in the same directory where this program's executable
is kept, and inside directories listed by the PATH
environment
variable.
This is required to know how much memory needs to be allocated for the various sections of the DJGPP program.
Note that the rest of the stub runs in protected mode.
This is done by calling the DPMI functions to allocate segment descriptors and memory for code and data, and set their base address, limit, and privileges.
The code, data, and BSS sections are read into the memory allocated above, by calling DOS via the DPMI service which allows to call real-mode functions from protected-mode programs.
This entry point is inside the library startup module, described next.
Here's what the library startup code does:
This causes a frequent programmatic error known as the NULL
pointer dereference to trigger an exception, and the offending program
gets the SIGSEGV
signal. The DPMI function required for this is
not part of the basic DPMI 0.9 spec, and is unsupported by Windows and
many other proprietary DPMI servers; but CWSDPMI does support it.
sbrk
memory-allocation mechanism.
This might sound simple, but is actually quite complicated, due to some peculiarities of DPMI memory allocation. For example, it requires a special 16-bit code that runs in real mode to be loaded into a buffer of conventional memory.
The stack size of DJGPP programs is 512KB by default, but it can be
changed both by the application and using the stubedit
program.
Many DOS programs need to access conventional memory, either to pass
data to and from DOS/BIOS functions, or to access memory-mapped devices
such as the video memory of the graphics adapter. Since the
conventional memory is by default not mapped into the program's data
segment, a special selector, known as _dos_ds
, is provided for
these purposes.
This requires to hook some hardware interrupts, e.g. to generate
SIGINT
when Ctrl-<C> is pressed, or to generate
SIGPROF
on a timer tick.
environ[]
array.
This includes getting long command lines from parent DJGPP program and Unix-style expansion of file-name wildcards. See Long command lines, and also see Unix-style file-name globbing, for more details.
main
function.
Since DJGPP programs use DOS and BIOS for system calls, many library functions need to actually issue various real-mode DOS/BIOS calls. I already described above how this is done in principle: by calling a special DPMI service provided for that.
However, many real-mode services require some data to be passed. For
example, when you write the contents of a buffer to a file, the
corresponding DOS function requires a pointer to the buffer to be put
into the DS:DX
pair of registers. Moreover, the buffer whose
pointer is passed to DOS must reside in the first Megabyte of the
address space, because real-mode addresses use only 20 bits. In
contrast, protected-mode programs use the full 32 bits for addressing,
and all the data is always above the 1MB mark6. Now, how do we pass such addresses to DOS?
This is where the so-called transfer buffer comes to our help. As
we saw, this buffer is allocated in conventional memory during the
program startup. The buffer is 16KB long by default, but its size can
be changed to any value between 2KB and 64KB using the stubedit
program. Every library function that needs to pass data to, or retrieve
data from, DOS/BIOS, needs to move that data between the transfer buffer
and the protected-mode memory. For example, to write a buffer to a
file, the contents of that buffer are copied to the transfer buffer, and
the real-mode segment:offset
-style address of the transfer buffer
is passed to DOS; to read data from a file, the address of the transfer
buffer is passed to DOS, and the data put there by DOS is then copied
from the transfer buffer to the buffer in protected-mode memory whose
address was passed by the calling application.
The startup code stores the real-mode address of the transfer buffer and its size in global variables, which are used by the library function to move data to and from the transfer buffer. The library also provides special functions to move the data between protected-mode memory and the transfer buffer as fast as possible, and thus to make this overhead smaller.
As long as the application calls relatively high-level library
functions, such as open
, read
, write
, stat
etc., all of the special processing just described is done automatically
and transparently by the library; the application doesn't need to know
anything about the transfer buffer and data copying that goes on under
the hood.
Library functions also provide other specialized processing in some
cases. For example, DOS cannot read or write more than 64K bytes in one
call, so the library breaks large requests into smaller chunks, each one
the size of the transfer buffer, and feeds them to DOS one by one. As
another example, consider memory-allocation functions such as
malloc
. Instead of allocating blocks off the conventional memory
by calling DOS, like real-mode programs do, DJGPP issues DPMI calls
to allocate extended memory and provide demand-paged virtual memory, so
that all of the available memory and swap space can be used by the
application via standard function calls.
This section describes some advanced features provided by DJGPP. Most of these features are built into the C library, but some are provided by the basic development utilities which are part of the DJGPP development environment. Since DJGPP is a Posix-compliant environment, many of these features are motivated by Unix compatibility.
The DJGPP header files and library functions are highly compatible
with other popular environments. In addition to full ANSI and Posix
compliance, DJGPP also offers compatibility to many PC and Unix
libraries. For example, DJGPP provides library functions that are
usually absent from other DOS- and Windows-based libraries, like
popen
, glob
, statfs
, getmntent
,
getpwnam
, select
, and ftw
. Other functions,
although they exist in DOS/Windows libraries, are incompatible with
Posix in subtle ways. For example, the ANSI-standard function
rename
typically fails in DOS/Windows implementations if the
target file already exists (because the underlying OS call fails).
DJGPP makes a point of sticking to Posix or Unix behavior in such
cases, even if it means more processing (like removing the target file
in the case of rename
).
A case in point is library functions stat
and fstat
. Unix
programs make extensive use of the inode number and the mode bits
returned by these functions. For example, GNU diff
examines the
inode numbers of the files it is about to compare, and if they are
equal, exits immediately on the assumption that both file names point to
the same file. However, DOS and Windows don't support inodes, and most
other DOS/Windows implementations return zero in the st_inode
member of struct stat
, which of course breaks diff
. Also,
the mode bits returned by fstat
are usually incorrect. In
contrast, the DJGPP implementation of these functions goes out of
its way to provide compatible implementations for these functions, and
in particular returns meaningful inode numbers7, even though it takes quite a lot
of code (for example, stat
code compiled totals about 17KB,
together with other library functions it calls).
When DOS invokes programs, it limits the length of the command line to
126 characters (excluding the program's name). This is a ridiculously
small limit; it doesn't even allow to compile GCC, since many commands
in GCC Makefile
s are much longer.
Therefore, DJGPP provides a mechanism to pass long command lines to
child programs. The actual command is stored in the transfer buffer,
and a pointer to that buffer is passed to the child program instead of
the command line itself. The startup code of the child program then
retrieves the actual command-line arguments and puts them into the
argv[]
array passed to main
.
DJGPP also supports the so-called response file method of
passing long command lines, whereby the command line is stored on a disk
file, and the name of that file is passed as @response-file
.
For example:
ar cq libmylib.a @files-list
All Unix programs assume that any file-name wildcards on their command
line were already expanded by the shell, to yield normal file names.
But DOS shells don't provide this functionality, so the wildcards would
wind up verbatim in the argv[]
array. To avoid the need to have
special code in every ported program that expands the wildcards, the
DJGPP startup code expands the wildcards automatically. The
expansion follows the Unix conventions, so *
expands to all file
names, unlike the DOS conventions where it excludes file names with
extensions.
The globbing code supports Unix-style quoting with the '
and
"
characters (most other DOS/Windows compilers and shells only
support "
). Escaping special characters with \
is limited
to the quote characters themselves, since \
serves as a directory
separator in DOS/Windows file names.
DJGPP also provides a special extension: the ...
wildcard
expands recursively to all the subdirectories. Thus, the following
command would search all files in all the subdirectories, recursively:
grep foo .../*
(This was hard to achieve even on Unix, until the recent release of the
GNU Grep package introduced the --recursive
option.)
system
function.
Traditionally, the system
library function calls the shell to
process its argument. However, stock DOS shell COMMAND.COM
is
too dumb to be useful in many cases. For example, it doesn't support
long command lines, even though DJGPP programs do; it doesn't
understand forward slashes in file names; and it doesn't return the exit
code of the child program to the parent.
Therefore, the DJGPP version of system
usually doesn't call
COMMAND.COM
at all. Instead, it internally emulates its
functionality, including redirection and pipes, and invokes the programs
directly. This allows to provide the following important features:
See Command line, but here it means that shell commands can have arbitrary length, even though the shell itself doesn't support that!
File names which are targets of redirection can be given in the Unix
/foo/bar
style. Unix devices, such as /dev/null
, are also
supported (see Unix devices).
The emulation code supports the foo ; bar
feature of several
commands separated by a semi-colon.
The emulation of the shell command cd
allows Unix-style forward
slashes in its argument, and also changes the drive if the argument
includes the drive letter.
If the environment variable SHELL
points to a name like sh
or bash
, system
invokes the shell to do everything, since
the internal shell emulation is not sophisticated enough to cover Unix
shell functionality.
Shell scripts can be invoked even if the SHELL
environment
variable doesn't point to a Unix-style shell, provided that the
interpreter whose name appears on the first script line after the
#!
signature can be found somewhere along the PATH
.
COMMAND.COM
is only invoked by system
to run batch files
or commands internal to the shell. However, system
always looks
for external programs first, so if you have e.g. a port of the GNU
echo
program installed, system
will call it even though
COMMAND.COM
has an internal (and very much inferior) command by
that name.
These features come in especially handy in the DJGPP port of GNU
make
. Where the original Unix code of make
invokes the
shell, the DJGPP port simply calls system
to execute the
commands in rules, and automatically gets support for long command lines
and Unix-style shells required to run many Makefile
s of Unix
origin.
The above extended functionality also means that whenever a Unix program
calls system
, in most cases the same call will work without any
changes when compiled with DJGPP. The result is not only ease of
porting, but also less probability to leave subtle bugs in the ported
program due to an overlooked fragment which assumes a Unix shell.
All DJGPP library functions pass file names to DOS via a single
low-level function. This allows to remap some special file names to
their DOS equivalents. For example, Unix-standard device names
/dev/null
and /dev/tty
are converted to their DOS
equivalents NUL
and CON
, respectively. File names which
begin with /dev/x/
, where x is a drive letter, are
converted to the DOS x:/
form; this is required for running
some Unix shell scripts which take apart the PATH
variable where
colons separate directories. The implementation of the chroot
functionality, which isn't supported directly by DOS and Windows, also
uses this file-name conversion.
This feature is built into the low-level file-oriented library
functions. It allows the application to install a handler for certain
filesystem calls, like open
, read
, fstat
,
dup
, close
, etc. If installed, such a handler is called
just before the appropriate primitive is invoked to pass the call to
DOS. If the handler returns a non-zero value, it is assumed to have
handled the call, and the usual primitive call is bypassed. Otherwise,
the library proceeds with calling DOS as usual.
This facility provides an easy way of handling special files and devices
which DOS and Windows don't support directly. For example, a program
can install a handler for special file names like /dev/ptyp0
and
emulate these non-existent devices via an async communications library.
Another way of putting filesystem extensions to a good use is when
there's a need to emulate functionality that DOS file I/O doesn't
support, even though the associated devices do exist. For example,
suppose you need to port code which sends special commands to the
terminal device via termcap
functions. DOS supports a terminal
device, but doesn't support termcap
. However, it is possible to
achieve the same effects if direct screen writes are used instead of
file I/O. By installing a filesystem extension handler for the standard
output handle, you could redirect all terminal I/O to direct screen
writes and implement all the necessary termcap
functionality,
without any changes to the program's source code. This is how the
DJGPP port of GNU ls
supports the --color
option
without forcing users to install a special terminal driver that
interprets ANSI escape sequences.
DOS system calls are limited to file names in the so-called 8+3 format: maximum 8 characters for the basename and maximum 3 characters for the extension. Therefore, it is impossible to access the long file names, offered by Windows 9X and Windows NT, via the DOS system calls. However, Windows 9X provides a special API (a bunch of special functions of software interrupt 21h) that allows DOS programs to access long file names. This API is widely known as the LFN API, where LFN is an acronym for Long File Names. For each file-oriented DOS system call, the LFN API includes a replacement that supports long file names. For example, there are functions to open files, list the files in a directory, create a directory, etc. using long names. The LFN API also adds several functions to access extended functionality supported by the Windows filesystems. For example, it is possible to get and set 3 times for each file, like on Unix, instead of only one time supported by DOS.
The DJGPP library features transparent and automatic support for long file names on Windows 9X8. The DJGPP startup code queries the system for the availability of the LFN API, and if it's available, all low-level file-oriented primitives are automatically switched to using the special LFN-aware functions. This run-time detection of the LFN support means that the same executable will run on DOS and on Windows, and will automatically support long file names when it runs on Windows 9X.
DOS doesn't support hard and symbolic links. However, DJGPP
emulates them to some extent. The link
library function
simulates hard links by copying. The symlink
library function
simulates a symbolic link for executable programs only, by creating a
2KB stub which is set up to run the COFF image from the target of the
link. Thus, ln -s grep fgrep
does what you'd expect.
Emacs is special because when it dumps itself during the build process, static and global variables are frozen in the dumped image with the last value they had at the time the program was dumped. DJGPP has a special facility in the library through which library functions can detect that the program was dumped and restarted. All library functions that need static variables, use this facility to reinitialize them. This allows Emacs to be built with DJGPP without the need to analyze whether each library function called by Emacs is dump-safe.
In addition to relying on GNU development tool-chain, DJGPP introduces several utilities written specifically for the project. These utilities are meant to assist the developer in solving specific tasks common for the DJGPP environment. Some of these utilities are listed below:
djtar
is a program that unpacks archives (but cannot create
them). It was originally written to unpack files created by tar
,
because DOS and Windows lack standard programs for that. Since the
original release, djtar
functionality was significantly extended,
and now it can unpack .tar.gz
and .zip
files as well. It
also can unpack archives from floppy disks written as raw
/dev/rfd0a
devices on Unix systems, and it uncompresses and
untars .tar.gz
files on the fly, by feeding the untar code with
output of the unzip code. The latter feature is very important when
unpacking large distributions, such as emacs-XX.YY.tar.gz
,
because pipes are implemented as temporary disk files on DOS/Windows,
and so on-the-fly decompression avoids creating huge temporary disk
files.
The ability to unzip .zip
archives makes djtar
the only
free program which does that, since it turns out that InfoZip's
UnZip
license does not comply with FSF's definition of free
software (according to Richard Stallman).
In addition, djtar
offers several features designed to prevent
problems due to DOS/Windows file-name restrictions, see DOS file names handling, below.
These two programs come in handy when you need to carry a large file
(usually, a compressed archive of a large distribution) on floppies.
djsplit
splits a file into smaller chunks whose size is
user-defined, and djmerge
splices the chunks back together.
These programs are close cousins of dos2unix
and unix2dos
,
respectively, but they have several clever tricks up their sleeve.
First, they take file names from the command-line arguments and rewrite
each file, instead of reading stdin
and writing stdout
;
thus, they can convert many files in a single run. And second, they
preserve the time stamps of the converted files, to keep utilities like
make
happy. With these programs, I can convert the entire
directory tree of C source files to the DOS CR-LF format with a single
command:
utod .../*.[ch]
This uses the DJGPP wildcard expansion and the special ...
wildcard mentioned above.
This is a replacement for the well-known move-if-changed
shell
script. It is very handy in Makefile
s which should run on
systems that don't have Bash installed. Since it understands Unix-style
forward slashes (like all DJGPP programs do), it is also widely
used in Makefile
s for copying files, instead of the shell's
internal COPY
command, since make
doesn't live well with
backslashes in file names.
As its name implies, redir
redirects standard handles. It was
originally written to allow redirection of stderr
, which stock
DOS shell COMMAND.COM
cannot do. You need this redirection,
e.g., when GCC spits out a long list of error messages which scroll off
the screen. redir
can also append redirected handled (a-la
>>
) and redirect stderr
to the same place as stdout
or vice versa, like what >&
does.
In addition, redir
reports the exit status of the program it
runs, and print the elapsed time used by the child. These features are
provided because, unlike on Unix, there are no standard utilities to do
that.
DJGPP debugging support doesn't include Unix-style core files which allow post-mortem debugging of a crashed program. To compensate for this deficiency, when a program crashes, a special library module prints the values stored in the CPU registers and the traceback of the function calls that led to the crash, as stored in the call frames pushed onto the stack.
However, the stack traceback, as printed, is hard to interpret, because
it only includes numeric addresses of the functions. The symify
program solves this problem. It reads the traceback directly from the
video memory, and uses the debug info recorded in the program's
executable file to convert the addresses into file names and line
numbers of the source files. It then adds the file names and line
numbers information near the corresponding addresses, thus making the
traceback easy to comprehend.
Besides the library functions and DJGPP-specific programs, a lot of special code went into the utilities ported or written for DJGPP, so that these utilities could work together smoothly and have the effect a user would expect. Some of these extensions are listed below:
PATH
format.
Unix uses :
to separate directory names in the value of
environment variables such as PATH
. Many shell scripts rely on
this feature to look for programs along the PATH
. For example,
the GNU-standard configure
scripts do that to find gcc
,
ranlib
and other programs, as part of the auto-configuration
process.
However, DOS and Windows use ;
to separate directories in
PATH
(because absolute file names include a drive letter, like in
d:/foo/bar
). This breaks shell scripts which search along the
PATH
.
To allow these scripts to run without changes, the DJGPP port of
Bash introduces a special variable PATH_SEPARATOR
. If this
variable is set to :
, Bash converts the value of PATH
to
pseudo-Unix form. For example, if the original value of PATH
is
like this:
PATH=c:\djgpp\bin;d:\gnu\emacs\bin
then setting PATH_SEPARATOR=:
converts it to this:
PATH=/dev/c/djgpp/bin:/dev/d/gnu/emacs/bin
This lets Unix shell scripts run unaltered. However, to prevent the
external commands from breaking (because they don't know anything about
PATH_SEPARATOR
), Bash converts the value of PATH
back to
its usual DOS style in the environment it passes to child programs.
The DJGPP library supports the special /dev/x/
file names by
converting them to the usual DOS x:/
format, before it issues DOS
calls, so all DJGPP-compiled utilities can be safely run by a
script when PATH_SEPARATOR
is set to :
.
test -x foo
looks for foo.exe
, foo.com
,
foo.bat
, etc. This is important e.g. in GNU configure
scripts which look for programs along the PATH
.
install foo /bin/foo
actually installs foo.exe
in
the target directory. Similarly, gcc -o foo
creates both
foo
and foo.exe
; the first causes make
to be happy
when Unix Makefile
is in use (since the target names are usually
extension-less on Unix), while the second can be run from the DOS
command prompt, since stock DOS shell refuses to run a program without
one of the executable extensions (.exe
, .com
or
.bat
) it knows about. Both of these features are intended for
using Unix Makefile
s without changes.
/bin/sh
cause the shell to be
looked for along the PATH
as well, so that users won't need to
have a /bin
directory.
lpr
, write to the local
printer device instead, if lpr
could not be located. Emacs and
dvips
are two examples of programs that offer this feature.
tar
and cpio
programs, and the djtar
utility supplied with the DJGPP development kit are examples of
such programs. They replace characters which aren't allowed in file
names, like +
on MS-DOS or "
on MS-Windows, and rename
files whose names are reserved on DOS/Windows by character devices (and
therefore writing to them could have unexpected results).
Another potential problems in unpacking file archives is that several
different file names can map to the same name after truncation to the
DOS 8+3 limits (see 8+3 file names) or as result of the automatic
renaming I just described. For this reason, djtar
refuses to
overwrite existing files, and requires the user to type in another name
under which the file will be extracted. If the user presses <RET>,
the file is skipped.
This interactive, one-by-one renaming might be tedious and error-prone,
when there's a lot of files to rename. A case in point is the test
suite in the GNU Textutils distribution with a lot of names like
n+4b2l10f-0FF
, njml17f-lmlmlo
, etc. For these cases,
djtar
has a command-line option which can be used to submit a
file with a mapping between original and DOS names; djtar
will
automatically rename every file mentioned there and will leave all other
file names intact. An example of putting this feature to use can be
seen in the latest versions of Textutils (look for the file
djgpp/fnchange.lst
and the instructions to use it in
djgpp/README
).
The features mentioned above are mostly small niceties. But can you imagine the amount of hacking needed to get Unix Makefiles and shell scripts to work on DOS and Windows machines, if these tidbits didn't exist?
Modern development environments support internationalization by
providing facilities to read, write, and display text on languages other
than English and character sets other than US-ASCII. For example, most
GNU packages support the gettext
library and proprietary
facilities similar to it, which allow the messages printed by programs
be in any of the supported native languages.
DJGPP, being a DOS/Windows-based environment which uses lots of software ported from Unix, faces several unique challenges on its way to internationalization. This section briefly outlines the problems and their possible solutions.
First, some background on international aspects of the operating systems supported by DJGPP.
The international features of MS-DOS rely on so-called DOS codepages. A codepage is a particular mapping between 128 non-ASCII characters and their 8-bit codes in the range [128..255] (the lower 128 codes in every codepage are always occupied by the usual 7-bit ASCII characters). IBM defined several codepages, each one identified with a unique number, to support certain character sets, and these codepages are included with each version of DOS. Every codepage roughly corresponds to one of the ISO-8859 character sets, but the mapping of the high 128 characters is different. For example, codepage 850 corresponds to ISO-8859-1 (a.k.a. Latin-1) character set, codepage 862 corresponds to the ISO-8859-8 (Hebrew) set, etc.
In the default text-mode operation, the DOS terminal is a character terminal which can display a single set of 256 glyphs at a time. This set is determined by the current DOS codepage. The default set of glyphs which corresponds to the native locale is usually burnt into the video hardware; to install a different codepage, you need to edit the system configuration files and reboot. This loads the glyphs of the character set supported by the new codepage into memory, and also updates other devices; for example, it downloads the corresponding font into the local printer.
Windows defines additional codepages, many of them similar or identical to the ISO-8859 character sets for the same locale (e.g., codepage 1252 is identical to the Latin-1 set). However, Windows doesn't allow DOS programs to use these new codepages, and it still requires a system reboot to replace the single supported DOS codepage. So DJGPP programs can still support only one codepage at a time, even when they run on Windows.
Therefore, to use i18n facilities such as the GNU gettext
package, DJGPP programs need an additional layer of recoding
characters, because the DOS codepage for a given locale maps characters
differently from the corresponding ISO-8859 character set. One solution
to this problem is to convert the existing *.po
files supplied
with GNU packages to corresponding DOS codepages. Such conversion can
be performed automatically by the GNU recode
utility, which
supports many of the existing codepages.
The DJGPP version of Emacs 20.4 employs similar technique to
display the character set supported by the current DOS codepage.
However, unlike gettext
, Emacs performs the conversion from the
ISO charset to the codepage and back in real time, by defining a special
coding system, which is driven by a table that maps the ISO
charset into the DOS codepage. The same coding system is also used to
read and write files produced by other DOS-based software. This
solution avoids introducing new character sets into Emacs, which would
be extremely undesirable, as Emacs already has too many
partially-overlapping character sets.
Conversion of a single character set might be the way to cause a program
speak your native language, but what about programs that need to display
more than a single character set at a time, like Emacs 20? Well, one
solution is to simulate the glyphs that cannot be displayed with similar
glyphs from other character sets. Thus, some Cyrillic characters can be
simulated by glyphs of similarly-looking ASCII characters. Where no
single glyph can reasonably stand for a non-ASCII character, it could be
simulated with strings of several characters. For example, the Latin-1
character ç
(a small c
with a cedilla) could be
displayed as the string {c,}
, where the braces serve as a
visual indication that this is a single character. Emacs makes this
solution based on glyph remapping possible by providing a facility known
as a display table, whereby each character can be mapped either to
a code of a single glyph, or to a string. If a character is mapped to a
string, Emacs redisplay code knows that this string stands for a single
character, and so commands which e.g. move point and count columns still
work correctly. This is how the DJGPP version of Emacs 20.4
manages to display character sets beyond the one supported by the
current codepage.
Solutions are also required for printing multi-lingual text from Emacs.
Currently, the only solution available is via the ps-print
package, which requires a printer with PostScript support or a
PostScript interpreter such as Ghostscript. Other printing commands,
like lpr-buffer
, currently support only one character set: the
one which corresponds to the installed DOS codepage.
In sum, as far as i18n is concerned, DJGPP is certainly more limited than modern GUI environments such as X Windows, but current solutions are quite adequate for most needs of a typical user.
The DJGPP project exists for 10 years. This might seem like not too long, but it is. Consider this: in 1989, when DJ Delorie began porting GCC, MS-DOS v4.00 was just released and became the hottest issue in the trade press, MS-Windows was not yet heard of outside Microsoft, Linux was still several years away, the latest version of GCC was 1.35, and Emacs was in version 18.5x. We might also reflect on what each one of us did around that year, to get a feeling how much water went under the bridge since then....
So what has DJGPP achieved during this time? This section offers a retrospective summary, and then attempts to outline future developments.
I think the most important achievement is that DJGPP brought the free software to the large community of DOS/Windows users. We may not like the reasons why these systems are so widespread, and we might resent the quality of the software which they run, but the fact remains that there is a huge installed base of such systems. DJGPP brings many users of these systems in touch with free software. It teaches them the value of free access to the sources and free exchange of knowledge and ideas about software internals. It also shows them how this freedom helps to make their software much better than proprietary tools, haunted by software patents, undocumented behavior, and non-disclosure agreements, ever could. Thanks to DJGPP, many young programmers have learned these lessons at the very beginning of their careers, and these are lessons they will not forget easily.
On a more practical note, consider the large body of free software
successfully ported to DOS/Windows as part of DJGPP during the
years. Besides GCC and Binutils, more than 50 GNU and free software
packages were ported, including Emacs, Bash, GDB, Make, Gawk, Perl,
TeX, Ghostscript, RCS, CVS, Tar, and many others. The document you
are reading now was written in Texinfo using Emacs 20.3, spell-checked
with Ispell, converted into Info and HTML with makeinfo
, typeset
with TeX, previewed as a PDF file produced with dvipdfm
, and
printed with dvips
, all of them DJGPP ports. The GNU
Software for MS-Windows and MS-DOS CD-ROM, first released by the FSF in
the last quarter of 1998, holds 400MB of GNU software ported to
DJGPP; people who bought that CD sometimes write to me that using
the software makes them forget what OS did they boot in the morning.
All of these ports are in active maintenance, and new versions are
ported as the GNU maintainers release them. Many GNU packages already
include DJGPP support as part of the official distribution, and
work is under way to add such support to other packages.
This abundance of free, high-quality, actively-maintained software which runs on platforms found in each household and in every office really makes a difference. It certainly makes the GNU project and its goals known and popular among users who could have never heard about GNU were it not for DJGPP. To me, it is no surprise that the GNU DOS/Windows CD-ROM instantly became such a big hit and sold more disks than all other GNU CD-ROMs together (200 copies sold during the first 2 months, which brought FSF about $9600). Thus, DJGPP not only makes GNU popular, it also helps to raise funds for the GNU project. Ironically, a project which began because the FSF thought it was impossible, ended up supporting the FSF. History made a full circle.
I know I promised to try to predict the future of DJGPP. But now, that we have done all this way and came to the end of this document, I must confess: I lied. I don't want to set my feet on the slippery path of predicting the future, first, because I'm not good at that, but mostly because DJGPP defies all predictions. DJGPP produces DOS executables, so it doesn't support native Windows programming (although DJGPP programs still make very good console applications when they run on Windows). Microsoft declared DOS dead and actively tries to retire all DOS-based software by deliberately preventing DOS programs running on Windows from accessing some lucrative and useful Windows services. In theory, this should have killed DJGPP. Nevertheless, many people not only use DJGPP, they even choose to run it not on Windows, but in plain DOS. All the hype about Windows being "the way of the future" notwithstanding, users prefer the stability and reliability of DOS-based DJGPP environment to a fancy GUI.
One thing I can be positive about: we will certainly see DJGPP
ports of more free software. Several packages, like egcs
,
inetutils
, recode
, and UCB Logo
are being ported as
we speak.
As for the core of DJGPP, its development depends on too many
factors unbeknownst to me. One obvious direction is to add support for
creating native Windows programs. But this is a large project which
requires several dedicated volunteers to work on it for several months.
It is not clear whether such a team could be assembled, given that many
potential candidates either switch to Linux or use one of the existing
free Windows development environments, like Mingw32
and
Cygwin
.
So the truth is, I don't know what the future of DJGPP will look like. Instead, let me tell what I hope the free software movement will learn from the DJGPP experience. I hope we could learn that free software projects should not ignore popular platforms just because we don't like their operating system. By supporting enthusiasts that are ready to bring free software to those platforms, we could do much better: we could expose a much larger audience to our projects, and we can raise money for continuing our projects by selling software ported to those platforms and support services for them.
PATH
separator, Unix-style: Features
chroot
support: Features
ioctl
, emulation: Features
make
, support for Unix features: Features
system
function, extended functionality: Features
This is not what DJGPP originally stood for, see The History of DJGPP.
See the DJGPP history page on DJ's Web server, for more details.
There is no official interpretation of the acronym DJGPP. A contest for the best name was held more than a year ago; the results can be found by searching the DJGPP mail archives.
This is exactly what Linux, Hurd, and latest versions of MS-Windows do. Interestingly enough, the original reason for DJ Delorie's interest in porting GCC was that he wanted to use it to write a 32-bit OS for PCs.
The name of the default DPMI server program
is recorded in the stub and can be changed by editing the stub with a
special program called stubedit
.
Theoretically, memory below 1MB could be used by DJGPP programs. However, since this memory is usually at a premium, all DPMI servers leave it alone; CWSDPMI uses it only if there's not enough memory above 1MB.
My personal
involvement with the DJGPP library development began when I wrote
the first version of stat
and fstat
which returned
meaningful inode numbers and also corrected some other frequent blunders
in DOS versions of these functions.
Windows NT does not include this API, therefore DJGPP programs cannot access long file names on NT systems. However, a beta version of a free LFN driver for NT is available.
webmaster | delorie software privacy |
Copyright © 1999 | Updated Jul 1999 |