X-Authentication-Warning: delorie.com: mail set sender to djgpp-workers-bounces using -f From: Message-Id: <200505140300.j4E30drm024968@speedy.ludd.ltu.se> Subject: wchar_t implementation and multibyte encoding To: DJGPP-WORKERS Date: Sat, 14 May 2005 05:00:39 +0200 (CEST) X-Mailer: ELM [version 2.4ME+ PL78 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-ltu-MailScanner-Information: Please contact the ISP for more information X-ltu-MailScanner: Found to be clean X-MailScanner-From: ams AT ludd DOT ltu DOT se Reply-To: djgpp-workers AT delorie DOT com Hello. I've been thinking about this a little. Let say we decide to encode Unicode in wchar_t, which is the only sane choice today. Then the functions iswalnum(), iswalpha(), etc. are either going to be implemented as: 1. switch() and many, many case:'s, 2. if( 0 <= char <= 31 ) { return 0 } if( 32 <= char <= 126 ) { return 1 } if( ... ) ..., or 3. tables as isalnum(), isalpha(), etc. are today. 1 and 2: A lot of code. If anything I think gcc extended case x ... y: can come in useful, so I prefer 1 over 2. 3: An enourmous table. As Unicode has the range 0 - 0x10ffff, we are talking about more than 1MB! Now if those functions (isw*()) should return different results depending on locale, the sizes explode. So I hope not. With regard to which multibyte encoding we should use, I strongly prefer UTF-8. Opinions? Right, MartinS