Mail Archives: cygwin/2010/01/21/10:41:40
On Jan 21 10:04, Mark J. Reed wrote:
> On Thu, Jan 21, 2010 at 8:40 AM, Corinna Vinschen wrote:
> > would somebody with Japanese and/or Chinese language background be so
> > When comparing strings linguistically (strcoll/wcscoll),
> > - are Hiragana and Katakana forms of the same character to be
> > treated as equal or as different?
>
> (Nit: they are not "the same character" in either the technical or
> traditional sense of "character"; they're the same syllable, but
> represented by different characters.)
>
> From the Unicode point of view, they are distinct; there is no defined
> equivalence, either canonical or compatibility, between corresponding
> Katakana and Hiragana syllables. The collation algorithm (which does
> take linguistic context into account) doesn't seem to say anything
> about such comparisons, though it's possible I missed something.
>
> But as a precedent which might be helpful, I note that with
> linguistic sensitivity active, Oracle 10g does compare Hiragana and
> Katakana forms of the same syllable as equal.
>
> > - are half-width and full-width forms of the same CJK character
> > treated as equal or as different?
>
> According to the Unicode normalization algorithm, half -width and
> full-width forms normalize to the same character, so they should be
> treated as equivalent. From the point of view of Unicode, there is no
> semantic difference, and the width property is informative, not
> normative. It's primarily encoded in Unicode to preserve round-trip
> compatibility with other standards, though it's also helpful for hints
> to rendering algorithms.
Thanks for the info. However...
linux$ cat jp.c
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
int
main (int argc, char **argv)
{
setlocale (LC_ALL, "ja_JP.UTF-8");
/* U+3042 = Hiragana letter A
U+30a2 = Katakana letter A
U+ff71 = Halfwidth Katakana letter A */
printf ("%d\n", wcscoll (L"\x3042", L"\x30a2"));
printf ("%d\n", wcscoll (L"\xff71", L"\x30a2"));
return 0;
}
linux$ gcc jp.c -o jp
linux$ ./jp
-83
-340
I expected that at least one of the comparisons returns 0.
Am I doing something wrong?
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -