delorie.com/djgpp/doc/eli-m17n99.html   search  
The DJGPP Project

Note from DJ: This is a copy of a presentation Eli Zaretskii gave during a trip to Japan.

This file describes the DJGPP project, its goals, current status, and future perspectives.


Node:Introduction, Next:, Previous:Top, Up:Top

Introduction

DJGPP, an acronym for DJ's GNU Programming Platform1, is a project which brings the GNU development tools to MS-DOS and MS-Windows systems. Its originator and principal maintainer is DJ Delorie; that's where the "DJ" in DJGPP comes from.

DJGPP is about Free Software. The ported GNU packages are, of course, free; however, the library and utilities developed specifically for DJGPP are also distributed under the GNU license. Since DJGPP supports platforms which have such a huge installed base, and since it is highly popular among MS-DOS/MS-Windows users, the project is a very important member of the Free Software movement. Significantly, a large proportion of DJGPP users are young programmers at the very beginning of their careers. Teaching those young people about the importance of free software and free sharing of ideas is in itself a worthy goal. DJGPP is in a unique position to perform this important community service because it usually is the first serious compiler used by young programmers.

But DJGPP is also about fun. It is fun to port industry-strength applications to MS-DOS and have them running seamlessly on top of a 16-bit "toy operating system". It is fun to see how these applications change the way your system looks and feels, to a point that you can almost think it is a Unix box. It is fun to have all the source code, down to the darkest corners of the library internals, free for reading and hacking. It is fun to be able to find and fix bugs no matter whether they occurred in the application code, in the library, or in the compiler. And it is fun to discuss all these matters with other users and developers all over the world, and to join forces to make the free software better and more powerful. More about this later.

This article presents an overview of the DJGPP project. Section 1 briefly tells the history of the project development. Section 2 explains how protected-mode DJGPP programs manage to run on top of MS-DOS even though MS-DOS and protected mode are incompatible. Section 3 describes several important features that DJGPP brings to MS-DOS and MS-Windows. Internationalization (a.k.a. I18N) aspects specific to DJGPP are discussed in Section 4. Finally, Section 5 summarizes the achievements of 10 years of DJGPP development, and attempts to predict its future.


Node:History, Next:, Previous:Introduction, Up:Top

The History of DJGPP

"In the beginning was the Word...", says the Bible.

Like every other human endeavor, DJGPP also started with a Word. And like it happens with almost everything else in the free software world, that word belonged to Richard Stallman. Here's how DJ Delorie himself describes the genesis of DJGPP2:

DJGPP was born around 1989 [...], when Richard Stallman spoke at a meeting of the Northern New England Unix Users Group (NNEUUG) at Data General, where I then worked. I asked if the FSF ever planned on porting gcc to MS-DOS [...], and he said it couldn't be done because gcc was too big and MS-DOS was a 16-bit operating system. Challenge in hand, I began.

Consequently, we should consider Richard Stallman a progenitor of DJGPP, or at least its godfather. Had it not be for his scepticism, it's possible that DJGPP would not have existed....

The first version of GCC ported by DJ was 1.35. It was compiled on a 386 machine running ISC Unix, linked with a hacked libc.a taken from that machine which had DOS-compatible replacements for system calls such as open, read, stat, etc. and converted to a DOS executable format with a custom program written by DJ: a first version of DJGPP, originally called djgcc, was born. It required Phar Lap's DOS Extender to run protected-mode code on top of real-mode DOS. See DJGPP Programs and MS-DOS, for more about DOS extender's role.

To compile itself, gcc needs lots of memory, which PCs didn't have at that time. Since the DOS extender used to run gcc didn't support virtual memory, DJ wrote his own DOS extender called go32. GCC version 1.37 was the first version built on a DOS platform using go32.

Next came the library. The first version was based on the BSD library whose sources were free'd at that time, and augmented with many custom DOS-specific functions that interfaced with the OS. The header files were based on those distributed with gcc.

The name was changed from DJGCC to DJGPP when C++ support was added. Initially, the name stood for DJ's G++, with the + characters replaced by ps because DOS doesn't allow + in file names. However, since the C++ compiler is integral to gcc distribution, DJGPP now probably stands for something like DJ's GNU Programming Platform3.

DJGPP version 1.05 was the first one available commercially, and it was a big success. Version 1.11 supported all DOS configurations, had somewhat limited support for running on MS-Windows (e.g., graphics and floating-point emulation didn't work), and appeared on the GNU Compiler Binaries CD-ROM.

This was in 1992, and around that time I myself began using DJGPP. I have just bought a brand-new 486-DX/33 and got an email account, and DJGPP v1.11m5 was the first version I downloaded and installed. Exposed to Unix-style tools by the excellent book Software Tools by B.W. Kernighan and P.J. Plauger, I was for years porting GNU programs to MS-DOS using 16-bit proprietary compilers. I became tired of dealing with missing headers, like <unistd.h>, missing functions, like popen and alloca, and missing functionality, like long command lines in a Makefile. DJGPP solved all these and many other problems, and I became instantly hooked.

By the end of 1994 DJGPP became so popular and the traffic on its mailing list became so intensive, that a FAQ list was sorely needed; the first version of the FAQ was released in February 1995. Today, in its 7th edition, the DJGPP FAQ list includes answers to 200 questions, its Texinfo source totals 540K bytes, and its printed version is more than 200 pages long.

DJGPP v1.x could not bootstrap itself: it required Borland's compiler to build the go32 extender. Cygnus, a big user of DJGPP for their DOS-based products, requested a self-bootstrapping version, so DJGPP v2 was born. Version 2 moves some parts of go32 into the C library, other parts into a stub loader produced by a special-purpose assembler capable of producing 16-bit code, and it relies on DPMI services to run on top of DOS; more about this in the next section.

Meantime, in response to the growing interest and user base, a news group dedicated to DJGPP, <comp.os.msdos.djgpp>, was created in June 1995. Nowadays, the traffic on the news group averages about 70 messages per day.

Version 2.0 of DJGPP was shipped in February 1996, after more than two years of development and testing. The v2 library is Posix-compliant, the only library that offers Posix compliance on MS-DOS, and one of the two available for MS-Windows. It also introduced transparent and automatic support of long file names on Windows 9X.

Version 2.01 was released in October 1996. The GNU Software for MS-Windows and MS-DOS CD-ROM, based on DJGPP v2.01 ports of many GNU packages, was released in the last quarter of 1998, and its first edition out-sold all other GNU CD-ROMs.

The latest version 2.02 of DJGPP was released in December 1998.


Node:Extending DOS, Next:, Previous:History, Up:Top

DJGPP Programs and MS-DOS

GCC generates 32-bit code, so DJGPP programs are 32-bit programs. GCC also doesn't know anything about segmented architecture of the x86 processors, so its code effectively enforces the data, stack and code segments to be constant during the program execution. However, real-mode segments of x86 CPUs are only 64KB-long. Therefore, to be able to compile large programs, like GCC itself, DJGPP must run in protected mode. This section describes the tricks pulled by DJGPP to make this possible.


Node:Protected Mode and DOS, Next:, Previous:Extending DOS, Up:Extending DOS

DOS Cannot Run in Protected Mode

Switching the CPU into protected mode is easy, but you cannot call DOS and BIOS services while the CPU is in protected mode. Why? Because DOS and BIOS code was written for execution in real mode, and so it constantly violates the rules of protected-mode programming. For example, DOS code loads many different values into segment registers, to overcome the 64KB limitation of a real-mode segment. But in protected mode, a segment register can only be loaded with a value that corresponds to one of the existing selectors; any other value causes a General Protection Fault (GPF in short).

So, if a program switches the CPU into protected mode and then calls DOS, e.g. to print a message, it will immediately crash the system. You can't write even the simplest Hello World program without hitting this brick wall!

It gets worse. DOS and BIOS code needs to be run even if the application program doesn't call any of their services. For example, 18 times a second there's a timer tick, a hardware interrupt issued by the timer chip that's supposed to advance the system clock. But the handler for the timer tick interrupt is part of BIOS, and it employs real-mode code.

So even if a program does nothing to call any real-mode code, some asynchronous system events will do that anyway, and the machine will still crash very promptly. Can the conflict between DOS/BIOS and the protected mode be solved? Yes; read on.


Node:DOS Extender, Next:, Previous:Protected Mode and DOS, Up:Extending DOS

DOS Extender Allows DOS and Protected Mode to Co-exist

The solution to this conflict, if you don't want to write a protected-mode operating system which replaces DOS and BIOS completely4, is to add a layer of software between your program and DOS/BIOS code that would switch the CPU from protected to real mode and back, as appropriate. This software layer is called DOS extender.

With a DOS extender, when a protected-mode program calls a real-mode service, the extender traps the call, switches the CPU to real mode, reissues the call, waits for the service to do its thing, then switches the CPU back into protected mode, and returns to the application code that called the real-mode service. Hardware interrupts, such as the timer tick and the keyboard interrupt, are also trapped by the extender, and also cause a switch to real mode and back.

You might think that these mode switches would considerably slow down the application. However, in practice, most programs don't call the OS services too often, and even when they do, the peripheral devices accessed by most of these services, such as the hard disk, are so much slower than modern CPUs, that the overhead of the mode switch is hardly ever noticed.


Node:v1 and go32, Next:, Previous:DOS Extender, Up:Extending DOS

DJGPP v1.x Setup with the go32 Extender

In DJGPP v1.x, go32 was such a DOS extender. It was loaded automatically by every program during its startup. In addition to the usual functions performed by DOS extenders, it also handled some unique DJGPP-related tasks:

Using an extender had an important advantage of being able to run on any DOS configuration, since go32 had special code to adapt itself to all known methods of switching into protected mode and managing extended memory. But it did have a significant drawback as well: the extender was loaded into conventional memory and each instance used about 130KB of that memory. Since most DOS systems had about 500 to 600 KBytes of free conventional memory, this means you couldn't have more than 3-4 nested levels of DJGPP programs. This was a grave limitation: for example, you couldn't build programs whose Makefiles required more than 2 recursive levels of make invocation (because GCC and the compiler passes it invokes require 2 additional levels of program nesting). DJGPP v2 solves this problem, as described below.


Node:v2 and DPMI, Next:, Previous:v1 and go32, Up:Extending DOS

DJGPP v2.x Setup with the DPMI services

DJGPP v2.x gets rid of the extender, and instead requires DPMI services to run. DPMI, an acronym for DOS Protected-Mode Interface, is a special API that allows protected-mode programs to run on top of DOS. It defines several functions that a protected-mode program (called a DPMI client) can use to perform such tasks as entering protected mode, allocating memory and segment descriptors, calling real-mode services, hooking interrupts, etc. Many modern operating systems for Intel CPUs include the DPMI services; all versions of MS-Windows, OS/2, and Linux DOS emulator are notable examples. There are also several proprietary DPMI servers for DOS, usually bundled with DOS memory managers such as QEMM and 386MAX; and FreeDOS includes a DPMI server as part of the default setup. For those systems which don't have a DPMI server, DJGPP v2.x comes with a free server called CWSDPMI; not surprisingly, CWSDPMI reuses a lot of code from go32. The DJGPP startup code checks for DPMI services, and if they aren't available, automatically looks for and loads cwsdpmi.exe, the CWSDPMI server.

The DPMI server (a.k.a. the DPMI host) solves most of the problems of running a protected-mode program on top of real-mode DOS. The rest of the functionality, which in v1.x was the responsibility of go32, is handled in v2.x by the DJGPP startup code and low-level library functions. Let me now briefly describe these two aspects of DJGPP operation.


Node:Startup, Next:, Previous:v2 and DPMI, Up:Extending DOS

DJGPP v2.x Startup Code

The DJGPP v2.x startup code includes two parts: the stub loader and the library startup module. The former is a single assembly-language module which is compiled by a special-purpose assembler, called djasm, that is capable of producing 16-bit DOS executables. This stub loader is prepended to every DJGPP program during linking, and is the only part that DOS understands; all the rest--the COFF executable--is just some weird data, as far as DOS is concerned.

The second part of the startup is in the library. It consists of several modules written part in C and part in assembly. Here's where the COFF image entry point is, and that is where the stub passes the execution after it loads the program and sets it up.

Here's the short description of what the stub does:

Here's what the library startup code does:


Node:Library, Previous:Startup, Up:Extending DOS

Library Interface with DOS and BIOS

Since DJGPP programs use DOS and BIOS for system calls, many library functions need to actually issue various real-mode DOS/BIOS calls. I already described above how this is done in principle: by calling a special DPMI service provided for that.

However, many real-mode services require some data to be passed. For example, when you write the contents of a buffer to a file, the corresponding DOS function requires a pointer to the buffer to be put into the DS:DX pair of registers. Moreover, the buffer whose pointer is passed to DOS must reside in the first Megabyte of the address space, because real-mode addresses use only 20 bits. In contrast, protected-mode programs use the full 32 bits for addressing, and all the data is always above the 1MB mark6. Now, how do we pass such addresses to DOS?

This is where the so-called transfer buffer comes to our help. As we saw, this buffer is allocated in conventional memory during the program startup. The buffer is 16KB long by default, but its size can be changed to any value between 2KB and 64KB using the stubedit program. Every library function that needs to pass data to, or retrieve data from, DOS/BIOS, needs to move that data between the transfer buffer and the protected-mode memory. For example, to write a buffer to a file, the contents of that buffer are copied to the transfer buffer, and the real-mode segment:offset-style address of the transfer buffer is passed to DOS; to read data from a file, the address of the transfer buffer is passed to DOS, and the data put there by DOS is then copied from the transfer buffer to the buffer in protected-mode memory whose address was passed by the calling application.

The startup code stores the real-mode address of the transfer buffer and its size in global variables, which are used by the library function to move data to and from the transfer buffer. The library also provides special functions to move the data between protected-mode memory and the transfer buffer as fast as possible, and thus to make this overhead smaller.

As long as the application calls relatively high-level library functions, such as open, read, write, stat etc., all of the special processing just described is done automatically and transparently by the library; the application doesn't need to know anything about the transfer buffer and data copying that goes on under the hood.

Library functions also provide other specialized processing in some cases. For example, DOS cannot read or write more than 64K bytes in one call, so the library breaks large requests into smaller chunks, each one the size of the transfer buffer, and feeds them to DOS one by one. As another example, consider memory-allocation functions such as malloc. Instead of allocating blocks off the conventional memory by calling DOS, like real-mode programs do, DJGPP issues DPMI calls to allocate extended memory and provide demand-paged virtual memory, so that all of the available memory and swap space can be used by the application via standard function calls.


Node:Features, Next:, Previous:Extending DOS, Up:Top

Features provided by DJGPP

This section describes some advanced features provided by DJGPP. Most of these features are built into the C library, but some are provided by the basic development utilities which are part of the DJGPP development environment. Since DJGPP is a Posix-compliant environment, many of these features are motivated by Unix compatibility.


Node:I18N, Next:, Previous:Features, Up:Top

DJGPP and Internationalization

Modern development environments support internationalization by providing facilities to read, write, and display text on languages other than English and character sets other than US-ASCII. For example, most GNU packages support the gettext library and proprietary facilities similar to it, which allow the messages printed by programs be in any of the supported native languages.

DJGPP, being a DOS/Windows-based environment which uses lots of software ported from Unix, faces several unique challenges on its way to internationalization. This section briefly outlines the problems and their possible solutions.

First, some background on international aspects of the operating systems supported by DJGPP.

The international features of MS-DOS rely on so-called DOS codepages. A codepage is a particular mapping between 128 non-ASCII characters and their 8-bit codes in the range [128..255] (the lower 128 codes in every codepage are always occupied by the usual 7-bit ASCII characters). IBM defined several codepages, each one identified with a unique number, to support certain character sets, and these codepages are included with each version of DOS. Every codepage roughly corresponds to one of the ISO-8859 character sets, but the mapping of the high 128 characters is different. For example, codepage 850 corresponds to ISO-8859-1 (a.k.a. Latin-1) character set, codepage 862 corresponds to the ISO-8859-8 (Hebrew) set, etc.

In the default text-mode operation, the DOS terminal is a character terminal which can display a single set of 256 glyphs at a time. This set is determined by the current DOS codepage. The default set of glyphs which corresponds to the native locale is usually burnt into the video hardware; to install a different codepage, you need to edit the system configuration files and reboot. This loads the glyphs of the character set supported by the new codepage into memory, and also updates other devices; for example, it downloads the corresponding font into the local printer.

Windows defines additional codepages, many of them similar or identical to the ISO-8859 character sets for the same locale (e.g., codepage 1252 is identical to the Latin-1 set). However, Windows doesn't allow DOS programs to use these new codepages, and it still requires a system reboot to replace the single supported DOS codepage. So DJGPP programs can still support only one codepage at a time, even when they run on Windows.

Therefore, to use i18n facilities such as the GNU gettext package, DJGPP programs need an additional layer of recoding characters, because the DOS codepage for a given locale maps characters differently from the corresponding ISO-8859 character set. One solution to this problem is to convert the existing *.po files supplied with GNU packages to corresponding DOS codepages. Such conversion can be performed automatically by the GNU recode utility, which supports many of the existing codepages.

The DJGPP version of Emacs 20.4 employs similar technique to display the character set supported by the current DOS codepage. However, unlike gettext, Emacs performs the conversion from the ISO charset to the codepage and back in real time, by defining a special coding system, which is driven by a table that maps the ISO charset into the DOS codepage. The same coding system is also used to read and write files produced by other DOS-based software. This solution avoids introducing new character sets into Emacs, which would be extremely undesirable, as Emacs already has too many partially-overlapping character sets.

Conversion of a single character set might be the way to cause a program speak your native language, but what about programs that need to display more than a single character set at a time, like Emacs 20? Well, one solution is to simulate the glyphs that cannot be displayed with similar glyphs from other character sets. Thus, some Cyrillic characters can be simulated by glyphs of similarly-looking ASCII characters. Where no single glyph can reasonably stand for a non-ASCII character, it could be simulated with strings of several characters. For example, the Latin-1 character ç (a small c with a cedilla) could be displayed as the string {c,}, where the braces serve as a visual indication that this is a single character. Emacs makes this solution based on glyph remapping possible by providing a facility known as a display table, whereby each character can be mapped either to a code of a single glyph, or to a string. If a character is mapped to a string, Emacs redisplay code knows that this string stands for a single character, and so commands which e.g. move point and count columns still work correctly. This is how the DJGPP version of Emacs 20.4 manages to display character sets beyond the one supported by the current codepage.

Solutions are also required for printing multi-lingual text from Emacs. Currently, the only solution available is via the ps-print package, which requires a printer with PostScript support or a PostScript interpreter such as Ghostscript. Other printing commands, like lpr-buffer, currently support only one character set: the one which corresponds to the installed DOS codepage.

In sum, as far as i18n is concerned, DJGPP is certainly more limited than modern GUI environments such as X Windows, but current solutions are quite adequate for most needs of a typical user.


Node:Outlook, Next:, Previous:I18N, Up:Top

Summary and Perspective

The DJGPP project exists for 10 years. This might seem like not too long, but it is. Consider this: in 1989, when DJ Delorie began porting GCC, MS-DOS v4.00 was just released and became the hottest issue in the trade press, MS-Windows was not yet heard of outside Microsoft, Linux was still several years away, the latest version of GCC was 1.35, and Emacs was in version 18.5x. We might also reflect on what each one of us did around that year, to get a feeling how much water went under the bridge since then....

So what has DJGPP achieved during this time? This section offers a retrospective summary, and then attempts to outline future developments.

I think the most important achievement is that DJGPP brought the free software to the large community of DOS/Windows users. We may not like the reasons why these systems are so widespread, and we might resent the quality of the software which they run, but the fact remains that there is a huge installed base of such systems. DJGPP brings many users of these systems in touch with free software. It teaches them the value of free access to the sources and free exchange of knowledge and ideas about software internals. It also shows them how this freedom helps to make their software much better than proprietary tools, haunted by software patents, undocumented behavior, and non-disclosure agreements, ever could. Thanks to DJGPP, many young programmers have learned these lessons at the very beginning of their careers, and these are lessons they will not forget easily.

On a more practical note, consider the large body of free software successfully ported to DOS/Windows as part of DJGPP during the years. Besides GCC and Binutils, more than 50 GNU and free software packages were ported, including Emacs, Bash, GDB, Make, Gawk, Perl, TeX, Ghostscript, RCS, CVS, Tar, and many others. The document you are reading now was written in Texinfo using Emacs 20.3, spell-checked with Ispell, converted into Info and HTML with makeinfo, typeset with TeX, previewed as a PDF file produced with dvipdfm, and printed with dvips, all of them DJGPP ports. The GNU Software for MS-Windows and MS-DOS CD-ROM, first released by the FSF in the last quarter of 1998, holds 400MB of GNU software ported to DJGPP; people who bought that CD sometimes write to me that using the software makes them forget what OS did they boot in the morning. All of these ports are in active maintenance, and new versions are ported as the GNU maintainers release them. Many GNU packages already include DJGPP support as part of the official distribution, and work is under way to add such support to other packages.

This abundance of free, high-quality, actively-maintained software which runs on platforms found in each household and in every office really makes a difference. It certainly makes the GNU project and its goals known and popular among users who could have never heard about GNU were it not for DJGPP. To me, it is no surprise that the GNU DOS/Windows CD-ROM instantly became such a big hit and sold more disks than all other GNU CD-ROMs together (200 copies sold during the first 2 months, which brought FSF about $9600). Thus, DJGPP not only makes GNU popular, it also helps to raise funds for the GNU project. Ironically, a project which began because the FSF thought it was impossible, ended up supporting the FSF. History made a full circle.

I know I promised to try to predict the future of DJGPP. But now, that we have done all this way and came to the end of this document, I must confess: I lied. I don't want to set my feet on the slippery path of predicting the future, first, because I'm not good at that, but mostly because DJGPP defies all predictions. DJGPP produces DOS executables, so it doesn't support native Windows programming (although DJGPP programs still make very good console applications when they run on Windows). Microsoft declared DOS dead and actively tries to retire all DOS-based software by deliberately preventing DOS programs running on Windows from accessing some lucrative and useful Windows services. In theory, this should have killed DJGPP. Nevertheless, many people not only use DJGPP, they even choose to run it not on Windows, but in plain DOS. All the hype about Windows being "the way of the future" notwithstanding, users prefer the stability and reliability of DOS-based DJGPP environment to a fancy GUI.

One thing I can be positive about: we will certainly see DJGPP ports of more free software. Several packages, like egcs, inetutils, recode, and UCB Logo are being ported as we speak.

As for the core of DJGPP, its development depends on too many factors unbeknownst to me. One obvious direction is to add support for creating native Windows programs. But this is a large project which requires several dedicated volunteers to work on it for several months. It is not clear whether such a team could be assembled, given that many potential candidates either switch to Linux or use one of the existing free Windows development environments, like Mingw32 and Cygwin.

So the truth is, I don't know what the future of DJGPP will look like. Instead, let me tell what I hope the free software movement will learn from the DJGPP experience. I hope we could learn that free software projects should not ignore popular platforms just because we don't like their operating system. By supporting enthusiasts that are ready to bring free software to those platforms, we could do much better: we could expose a much larger audience to our projects, and we can raise money for continuing our projects by selling software ported to those platforms and support services for them.


Node:Index, Previous:Outlook, Up:Top

Index

Menu


Footnotes

  1. This is not what DJGPP originally stood for, see The History of DJGPP.

  2. See the DJGPP history page on DJ's Web server, for more details.

  3. There is no official interpretation of the acronym DJGPP. A contest for the best name was held more than a year ago; the results can be found by searching the DJGPP mail archives.

  4. This is exactly what Linux, Hurd, and latest versions of MS-Windows do. Interestingly enough, the original reason for DJ Delorie's interest in porting GCC was that he wanted to use it to write a 32-bit OS for PCs.

  5. The name of the default DPMI server program is recorded in the stub and can be changed by editing the stub with a special program called stubedit.

  6. Theoretically, memory below 1MB could be used by DJGPP programs. However, since this memory is usually at a premium, all DPMI servers leave it alone; CWSDPMI uses it only if there's not enough memory above 1MB.

  7. My personal involvement with the DJGPP library development began when I wrote the first version of stat and fstat which returned meaningful inode numbers and also corrected some other frequent blunders in DOS versions of these functions.

  8. Windows NT does not include this API, therefore DJGPP programs cannot access long file names on NT systems. However, a beta version of a free LFN driver for NT is available.


  webmaster     delorie software   privacy  
  Copyright © 1999     Updated Jul 1999