Date: Fri, 16 Mar 2001 11:46:32 +0200 From: "Eli Zaretskii" Sender: halo1 AT zahav DOT net DOT il To: djgpp-workers AT delorie DOT com Message-Id: <7263-Fri16Mar2001114631+0200-eliz@is.elta.co.il> X-Mailer: Emacs 20.6 (via feedmail 8.3.emacs20_6 I) and Blat ver 1.8.6 CC: ST001906 AT HRZ1 DOT HRZ DOT TU-Darmstadt DOT De, recode-bugs AT IRO DOT UMontreal DOT CA, djgpp-workers AT delorie DOT com In-reply-to: <15025.20732.410053.828022@honolulu.ilog.fr> (message from Bruno Haible on Fri, 16 Mar 2001 00:32:12 +0100 (CET)) Subject: Re: OS/DJGPP specific difficulties with recode 3.6 References: <4B62C66334B AT HRZ1 DOT hrz DOT tu-darmstadt DOT de> <15025 DOT 20732 DOT 410053 DOT 828022 AT honolulu DOT ilog DOT fr> Reply-To: djgpp-workers AT delorie DOT com Errors-To: nobody AT delorie DOT com X-Mailing-List: djgpp-workers AT delorie DOT com X-Unsubscribes-To: listserv AT delorie DOT com Precedence: bulk > From: Bruno Haible > Date: Fri, 16 Mar 2001 00:32:12 +0100 (CET) > > > I will only show the output that diff produces for the first test: > > 7. ./dumps.m4:3 --- - Thu Mar 15 15:38:49 2001 > > +++ stdout Thu Mar 15 15:38:48 2001 > > @@ -1,21 +1,23 @@ > > - 10 > > - 97, 10 > > - 97, 98, 10 > > - 97, 98, 99, 10 > > - 97, 98, 99, 100, 10 > > - 97, 98, 99, 100, 101, 102, 103, 104, 105, 10 > > + 13, 10 > > + 97, 13, 10 > > + 97, 98, 13, 10 > > + 97, 98, 99, 13, 10 > > + 97, 98, 99, 100, 13, 10 > > + 97, 98, 99, 100, 101, 102, 103, 104, 105, 13, 10 > > CR/LF. The tests apparently expect a Unix compatible 'echo' command. No, it expects `echo' to produce Unix-style LF-only EOLs. The test suite _does_ use a Unix compatible `echo', which comes from ported GNU Sh-utils. When I worked on recode 3.4 and 3.5, I asked Francois why doesn't the test suite specify the surface explicitly, as in "foo..bar/". This would allow the EOL format of generated files to be predictable. Also, the test suite should IMHO not assume any specific EOL-related behavior from programs besides recode it invokes. In many cases, using recode (with a trivial conversion spec) instead of echo is a much better alternative, since it allows an explicit control of the EOL format in produced files. IMHO, this way we could eliminate many of the horrible hacks that need to be added to the distribution to make the test suite work on non-Posix platforms, and as a bonus, the test suite will suffer from much less bit-rot than what we see now. I don't think Francois had time to reply to those suggestions, but perhaps they can be considered now. > The assumption that all non-Microsoft-OS users are in a Latin1 locale > is broken. The assumption that all DOS users use the IBM-PC = CP437 > character set is broken as well. You made a list of all character > encodings used in DOS for config.charset, a few weeks ago, didn't you? Nevertheless, it is IMHO important to have a reasonable default. If the charset is not specified by the user or the environment, I suggest the following fallback procedure: - try to estimate the codepage from the country code (the latter is returned by a special system call); - if that fails, look at DEFAULT_CHARSET; - if that fails as well, use cp437 as the last resort. It's true that cp437 is not a universal default, but in the absence of the other two fallbacks, it's good enough, because that's how a bare-bones DOS system with an empty CONFIG.SYS behaves. > Would you mind changing in your port > > name = "char"; /* locale dependent */ > > into > > > #if O_BINARY > name = "char/crlf"; /* locale dependent but with CR-LF surface */ > #else > name = "char"; /* locale dependent */ > #endif I think this is worse than what I suggest above. I'm not even sure it would be better than blindly assuming cp437 as the last resort, but perhaps I'm wrong. In any case, I think the possibility to estimate the codepage from the country code should not be ignored. > > it will evaluate the environment variable DEFAULT_CHARSET for > > getting the appropiate charset. This character set always implies > > the used surface. Of course, the average MSDOS/DJGPP user will never > > set this value at all. > > Which is exactly why we went through the config.charset horror. Once > for all applications, including recode. If setting DEFAULT_CHARSET is important, we could arrange for it to be set in DJGPP.ENV, or even by the library startup code. Alternatively, recode or libiconv could include a (DJGPP-specific) static constructor which pushes the correct DEFAULT_CHARSET into the program's environment.