From: Bruno Haible MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Message-ID: <15025.20732.410053.828022@honolulu.ilog.fr> Date: Fri, 16 Mar 2001 00:32:12 +0100 (CET) To: "Juan Manuel Guerrero" Cc: recode-bugs AT IRO DOT UMontreal DOT CA, djgpp-workers AT delorie DOT com Subject: Re: OS/DJGPP specific difficulties with recode 3.6 In-Reply-To: <4B62C66334B@HRZ1.hrz.tu-darmstadt.de> References: <4B62C66334B AT HRZ1 DOT hrz DOT tu-darmstadt DOT de> X-Mailer: VM 6.72 under 21.1 (patch 8) "Bryce Canyon" XEmacs Lucid Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id SAA21121 Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk Juan Manuel Guerrero writes: > While trying to compile recode 3.6 out-of-the-box > on MSDOS using DJGPP I have found two difficulties: > > a) An OS specific issue. > I will only show the output that diff produces for the first test: > 7. ./dumps.m4:3 --- - Thu Mar 15 15:38:49 2001 > +++ stdout Thu Mar 15 15:38:48 2001 > @@ -1,21 +1,23 @@ > - 10 > - 97, 10 > - 97, 98, 10 > - 97, 98, 99, 10 > - 97, 98, 99, 100, 10 > - 97, 98, 99, 100, 101, 102, 103, 104, 105, 10 > + 13, 10 > + 97, 13, 10 > + 97, 98, 13, 10 > + 97, 98, 99, 13, 10 > + 97, 98, 99, 100, 13, 10 > + 97, 98, 99, 100, 101, 102, 103, 104, 105, 13, 10 CR/LF. The tests apparently expect a Unix compatible 'echo' command. > It is *not* worth to reproduce diff's output for all the other failing tests. Yes, it's obvious. > 1) The file ./m4/microsoft.m4 has been completely removed. This means that not > only the file has been removed, the functionality proveded by it has also been > removed and has **not** been substituted by some other appropiate code. As the > name suggests, microsoft.m4 supplied code needed to detect if DOS/windows is > used as OS. There are other ways to do it. #if, or looking at config.guess output. > The script configure.in contained code that defined the macro > DEFAULT_CHARSET to IBM-PC or latin-1 based on the result returned by microsoft.m4. The assumption that all non-Microsoft-OS users are in a Latin1 locale is broken. The assumption that all DOS users use the IBM-PC = CP437 character set is broken as well. You made a list of all character encodings used in DOS for config.charset, a few weeks ago, didn't you? > 2) Now, the same snippet from recode 3.6, function disambiguate_name() > from file ./src/names.c: > /* Look for a match. */ > > if (!name || !*name) > switch (find_type) > { > case ALIAS_FIND_AS_CHARSET: > case ALIAS_FIND_AS_EITHER: > name = getenv ("DEFAULT_CHARSET"); > if (!name) > name = "char"; /* locale dependent */ > break; > > default: > return NULL; > > The important issue is to notice the use and function of the > macro DEFAULT_CHARSET. With this macro, an OS specific (and appropiate) > charset ****and**** surface (CRLF or LF) The point about the surface escaped me and François, when we discussed it. Would you mind changing in your port name = "char"; /* locale dependent */ into #if O_BINARY name = "char/crlf"; /* locale dependent but with CR-LF surface */ #else name = "char"; /* locale dependent */ #endif > it will evaluate the environment variable DEFAULT_CHARSET for > getting the appropiate charset. This character set always implies > the used surface. Of course, the average MSDOS/DJGPP user will never > set this value at all. Which is exactly why we went through the config.charset horror. Once for all applications, including recode. > By inspection of the recode 3.6 code it can be seen that it will default to "char" > and this selection implies always LF as surface, making recode 3.6 almost useless > for the mayority of the non-POSIX platform users like the MSDOS/DJGPP ones. OK, I've understood. Do *you* always think at everything and never make bugs? > This means, I will simply replace the following code: > if (!name) > name = "char"; /* locale dependent */ > break; > > by this one: > if (!name) > name = "IBM-PC"; > break; Please use the change above. Not everyone uses CP437. Bruno