From: "Juan Manuel Guerrero" Organization: Darmstadt University of Technology To: recode-bugs AT iro DOT umontreal DOT ca Date: Thu, 15 Mar 2001 23:30:00 +0200 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: 7BIT Subject: OS/DJGPP specific difficulties with recode 3.6 CC: djgpp-workers AT delorie DOT com X-mailer: Pegasus Mail for Windows (v2.54DE) Message-ID: <4B62C66334B@HRZ1.hrz.tu-darmstadt.de> Reply-To: djgpp-workers AT delorie DOT com While trying to compile recode 3.6 out-of-the-box on MSDOS using DJGPP I have found two difficulties: a) An OS specific issue. The produced products (recode.exe and librecode.a) fails for the following tests from the testsuit: Individual surfaces. 7. ./dumps.m4:3 --- - Thu Mar 15 15:38:49 2001 FAILED near `dumps.m4:31' 11. ./dumps.m4:92 --- - Thu Mar 15 15:39:19 2001 FAILED near `dumps.m4:116' 15. ./dumps.m4:174 --- - Thu Mar 15 15:39:49 2001 FAILED near `dumps.m4:198' 19. ./dumps.m4:256 --- - Thu Mar 15 15:40:19 2001 FAILED near `dumps.m4:288' 23. ./dumps.m4:353 --- - Thu Mar 15 15:40:48 2001 FAILED near `dumps.m4:381' 27. ./dumps.m4:442 --- - Thu Mar 15 15:41:18 2001 FAILED near `dumps.m4:466' 31. ./dumps.m4:522 --- - Thu Mar 15 15:41:48 2001 FAILED near `dumps.m4:554' 35. ./dumps.m4:619 --- - Thu Mar 15 15:42:17 2001 FAILED near `dumps.m4:647' 39. ./dumps.m4:708 --- - Thu Mar 15 15:42:47 2001 FAILED near `dumps.m4:736' 43. ./base64.m4:3 --- - Thu Mar 15 15:43:16 2001 FAILED near `base64.m4:22' Individual charsets. 49. ./african.m4:3 FAILED near `african.m4:31' 50. ./african.m4:40 FAILED near `african.m4:62' 51. ./african.m4:71 FAILED near `african.m4:101' 52. ./african.m4:110 FAILED near `african.m4:134' 53. ./african.m4:143 FAILED near `african.m4:162' 56. ./utf7.m4:3 --- - Thu Mar 15 15:44:10 2001 FAILED near `utf7.m4:20' Writing `debug-NN.sh' scripts, NN = 7 11 15 19 23 27 31 35 39 43 49 50 51 52 53 56, done ================================================ ERROR: Suite unsuccessful, 16 of 95 tests failed ================================================ I will only show the output that diff produces for the first test: 7. ./dumps.m4:3 --- - Thu Mar 15 15:38:49 2001 +++ stdout Thu Mar 15 15:38:48 2001 @@ -1,21 +1,23 @@ - 10 - 97, 10 - 97, 98, 10 - 97, 98, 99, 10 - 97, 98, 99, 100, 10 - 97, 98, 99, 100, 101, 102, 103, 104, 105, 10 + 13, 10 + 97, 13, 10 + 97, 98, 13, 10 + 97, 98, 99, 13, 10 + 97, 98, 99, 100, 13, 10 + 97, 98, 99, 100, 101, 102, 103, 104, 105, 13, 10 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, -112, 113, 114, 115, 10 +112, 113, 114, 115, 13, 10 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, -112, 113, 114, 115, 116, 117, 118, 119, 122, 121, 122, 65, 66, 67, 10 +112, 113, 114, 115, 116, 117, 118, 119, 122, 121, 122, 65, 66, 67, 13, + 10 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 122, 121, 122, 65, 66, 67, 68, - 69, 70, 71, 72, 73, 74, 75, 76, 77, 10 + 69, 70, 71, 72, 73, 74, 75, 76, 77, 13, 10 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 122, 121, 122, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, - 84, 85, 86, 87, 10 + 84, 85, 86, 87, 13, 10 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 122, 121, 122, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, - 84, 85, 86, 87, 88, 89, 90, 48, 49, 50, 51, 52, 53, 54, 10 + 84, 85, 86, 87, 88, 89, 90, 48, 49, 50, 51, 52, 53, 54, 13, + 10 FAILED near `dumps.m4:31' It is *not* worth to reproduce diff's output for all the other failing tests. They do not contain any new information. If you inspect the first few lines of the output you will notice that the reason for the failure is the different EOL style used. This means that the reference file uses UNIX-style EOL (LF) and recode produces DOS-style EOL (CRLF). *All* other tests fail due to the same issue. The reason for this failure is a concatenation of changes in the recode 3.6 distribution. 1) The file ./m4/microsoft.m4 has been completely removed. This means that not only the file has been removed, the functionality proveded by it has also been removed and has **not** been substituted by some other appropiate code. As the name suggests, microsoft.m4 supplied code needed to detect if DOS/windows is used as OS. The script configure.in contained code that defined the macro DEFAULT_CHARSET to IBM-PC or latin-1 based on the result returned by microsoft.m4. Once again, microsoft.m4 and this code in configure.in has been completely removed ***without*** trying to reproduce this functionality in some other way. 2) This is the relevant snippet from recode 3.5, function disambiguate_name() from file ./src/names.c: /* Look for a match. */ if (!name || !*name) if (type == SYMBOL_FIND_AS_CHARSET || type == SYMBOL_FIND_AS_EITHER) { name = getenv ("DEFAULT_CHARSET"); if (!name) { #ifdef DEFAULT_CHARSET name = DEFAULT_CHARSET; if (!*name) #endif return NULL; Now, the same snippet from recode 3.6, function disambiguate_name() from file ./src/names.c: /* Look for a match. */ if (!name || !*name) switch (find_type) { case ALIAS_FIND_AS_CHARSET: case ALIAS_FIND_AS_EITHER: name = getenv ("DEFAULT_CHARSET"); if (!name) name = "char"; /* locale dependent */ break; default: return NULL; The important issue is to notice the use and function of the macro DEFAULT_CHARSET. With this macro, an OS specific (and appropiate) charset ****and**** surface (CRLF or LF) is selected for recode 3.5 at configuration and later at run time. This is **no** longer true for recode 3.6. Once again, if recode 3.5 is started it will evaluate the environment variable DEFAULT_CHARSET for getting the appropiate charset. This character set always implies the used surface. Of course, the average MSDOS/DJGPP user will never set this value at all. Probably he will not even know that it exist at all. In this case recode 3.5 will **default** to the content of the macro DEFAULT_CHARSET and this is IBM-PC. But IBM-PC implies CRLF as surface and this selection will DTRT for the MSDOS/DJGPP users. By inspection of the recode 3.6 code it can be seen that it will default to "char" and this selection implies always LF as surface, making recode 3.6 almost useless for the mayority of the non-POSIX platform users like the MSDOS/DJGPP ones. Of course, this behaviour can be changed by the user by setting DEFAULT_CHARSET=IBM-PC before invoking recode.exe. In this case **non** of the tests in the testsuit will fail. It is completely unclear to me why the old (and very well working) code has been replaced by this ***completely*** posix centric code. This code makes recode almost useless for non-posix users. Even worse, I have inspected very carefully the files README, news and recode.texinfo. I have found **nowhere** a reference to this new program behaviour. A naive DJGPP user that compiles recode 3.6 out of the box and does **not** run the testsuit (this is probably the normal case) will get an useless binary and will probably *never* notice it. Once again, I am not judging about the changes introduced with this version of recode, but if such drastic changes to the sources are done, this should be at least documented in the readme or news file so the non-posix OS user becomes warned about the new behaviour of the binary. I have never been envolved with recode development so I will **not** propose any code to change this issue. I do not know how this issue will be handled by Francois Pinard in the future, so I will not interfer here. For the DJGPP port that I will upload to simtel.net I will solve this difficulty by use of brute force. This means, I will simply replace the following code: if (!name) name = "char"; /* locale dependent */ break; by this one: if (!name) name = "IBM-PC"; break; This will make recode 3.6 work on WinDos in the same way as recode 3.5 worked. IMHO, this is what the average DJGPP user will expect. Once again, I am *not* proposing this code change for stock recode sources. b) A DJGPP specific issue. To cope with this issue I will send a patch directly to Francois Pinard. The patch is long and will boring most of the audience on the different NGs. The patch will deal only with files from the contrib/ subdir. The goal of the patch is: 1) remove unneeded files from the contrib/ subdir. This are the files: djgpp-README and djgpp-diffs. Both files have been part of the DJGPP port of recode 3.4 and are of no use anymore. Experience with the DJGPP port of recode 3.5 leads me to the conclusion that these files confuse the users. They seem not to know which person (Francois Pinard, Wojciech Galazka and Juan Manuel Guerrero) is responsible for what. IMHO, there will be no lost if this files are removed. This avoids to have duplicated DJGPP specific README and diffs files in the contrib/ subdir. Their contents are obsoleted anyway. The patch will remove this files. 2) The patch will create the following files: contrib/readme.in contrib/config.site contrib/configdj.bat contrib/configdj.sed contrib/fnchange.in contrib/recodepo.sh All this files are needed to configure and compile recode 3.X out-of-the-box. Recode now uses libiconv. Due to the great amount of filenames used in the libiconv/ subdir that do **not** fit into the 8.3 MSDOS namespace, some of the filenames must be changed by a MSDOS/DJGPP user that wants to compile the original distribution out of the box. For this purpose the file fnchange.in is supplied. If djtar.exe is used to untar the original distribution, fnchange.in will allow to rename the problematic files on the fly. This means that files like libiconv/iso8859_1.h will become libiconv/iso/8859_1.h etc. configdj.bat, configdj.sed and recodepo.sh will modify the Makefile.ins and source files to account for the new directory structure of libiconv. recodepo.sh will recode the .po files from the unix charsets to the appropiate DOS codepages. 3) The patch will modify the file contrib/Makefile.am to account for the DJGPP specific changes. Of course, all this changes will *not* interfer with configuation and compilation of recode on any other platform. As usual, comments, objections, suggestions, questions are welcome. Regards, Guerrero, Juan Manuel