Mail Archives: djgpp-workers/2005/05/15/06:54:06
X-Authentication-Warning: | delorie.com: mail set sender to djgpp-workers-bounces using -f
|
From: | <ams AT ludd DOT ltu DOT se>
|
Message-Id: | <200505140300.j4E30drm024968@speedy.ludd.ltu.se>
|
Subject: | wchar_t implementation and multibyte encoding
|
To: | DJGPP-WORKERS <djgpp-workers AT delorie DOT com>
|
Date: | Sat, 14 May 2005 05:00:39 +0200 (CEST)
|
X-Mailer: | ELM [version 2.4ME+ PL78 (25)]
|
MIME-Version: | 1.0
|
X-ltu-MailScanner-Information: | Please contact the ISP for more information
|
X-ltu-MailScanner: | Found to be clean
|
X-MailScanner-From: | ams AT ludd DOT ltu DOT se
|
Reply-To: | djgpp-workers AT delorie DOT com
|
Hello.
I've been thinking about this a little. Let say we decide to encode
Unicode in wchar_t, which is the only sane choice today.
Then the functions iswalnum(), iswalpha(), etc. are either going to be
implemented as:
1. switch() and many, many case:'s,
2. if( 0 <= char <= 31 ) { return 0 }
if( 32 <= char <= 126 ) { return 1 }
if( ... )
..., or
3. tables as isalnum(), isalpha(), etc. are today.
1 and 2: A lot of code. If anything I think gcc extended case x ... y:
can come in useful, so I prefer 1 over 2.
3: An enourmous table. As Unicode has the range 0 - 0x10ffff, we are
talking about more than 1MB!
Now if those functions (isw*()) should return different results
depending on locale, the sizes explode. So I hope not.
With regard to which multibyte encoding we should use, I strongly
prefer UTF-8.
Opinions?
Right,
MartinS
- Raw text -