delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin-developers/2002/07/03/08:57:09

Mailing-List: contact cygwin-developers-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-developers-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin-developers/>
List-Post: <mailto:cygwin-developers AT cygwin DOT com>
List-Help: <mailto:cygwin-developers-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-developers-owner AT cygwin DOT com
Delivered-To: mailing list cygwin-developers AT cygwin DOT com
To: <cygwin-developers AT cygwin DOT com>
Reply-To: <cygwin-developers AT cygwin DOT com>
Subject: Re: UTF8 support in Cygwin
References: <008401c22279$68759a00$0100a8c0 AT advent02>
Mime-Version: 1.0 (generated by tm-edit 7.106)
From: Kazuhiro Fujieda <fujieda AT jaist DOT ac DOT jp>
Date: 03 Jul 2002 21:57:01 +0900
In-Reply-To: "Chris January"'s message of Wed, 3 Jul 2002 11:07:15 +0100
Message-ID: <s1ssn31nf4y.fsf@jaist.ac.jp>
Lines: 56

>>> On Wed, 3 Jul 2002 11:07:15 +0100
>>> "Chris January" <chris AT atomice DOT net> said:

> My question is, does anyone have any objections to doing things this way,
> and if so, can they suggest a better way? I don't want to patch the whole of
> Cygwin and then have to re-write everything at a later date.

I'd like to propose supporting other codepages than UTF8 and
making it connected with other portions than filenames.

For example, in case of CYGWIN=codepage:20866, suppose
the `parse_options' set current_codepage = other_cp and
current_cpnum = (UINT)20866.
Your example would become as follows.

  if (current_codepage == other_cp)
    {
      WCHAR wbuf[MAX_PATH];
      if (MultiByteToWideChar (current_cpnum, 0, get_win32_name(), -1,
                               wbuf, MAX_PATH) == 0)
        {
          __seterrno ();
          goto done;
        }
      x = CreateFileW (wbuf, access, shared, &sa, creation_distribution,
                       file_attributes, 0);
    }
  else
    x = CreateFileA (get_win32_name (), access, shared, &sa, creation_distribution,
      file_attributes, 0);

Moreover, get_cp in miscfunc.cc would have to become as follows.

    UINT
    get_cp ()
    {
      switch (current_codepage)
        {
        case ansi_cp:
          return GetACP();
        case oem_cp:
          return GetOEMCP();
        case other_cp:
          return current_cpnum;
        }
    }

When we want to use UTF8, we set codepage:65001 or codepage:utf8.
The latter case needs for the parser to accept "utf8" and
translate it to CP_UTF8 (65001).

How about this idea?
____
  | AIST      Kazuhiro Fujieda <fujieda AT jaist DOT ac DOT jp>
  | HOKURIKU  Center for Information Science
o_/ 1990      Japan Advanced Institute of Science and Technology

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019