delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/09/22/05:57:59

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Tue, 22 Sep 2009 11:57:39 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: non-BMP character width
Message-ID: <20090922095739.GS20981@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <200909161148 DOT n8GBm4ha001469 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <20090921163348 DOT GL20981 AT calimero DOT vinschen DOT de> <h98b17$jbj$1 AT ger DOT gmane DOT org> <20090921175759 DOT GM20981 AT calimero DOT vinschen DOT de> <4AB8592F DOT 9060803 AT lapo DOT it>
MIME-Version: 1.0
In-Reply-To: <4AB8592F.9060803@lapo.it>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Sep 22 06:57, Lapo Luchini wrote:
> Corinna Vinschen wrote:
> > Sure.  I was specificially asking for a testcase, preferrably in
> > plain C, which allows to reproduce this under a debugger.
> 
> Actually, I can't reproduce that, but I guess it's a problem of the
> specific console he's using (Thomas, which one is that?): on mintty it
> works ok (I'm not really sure it outputs U+10001, but it surely shows a
> single box) and on rxvt it just shows as four ISO-8859-1 chars:
> (es expected, as native rxvt doesn't support Unicode)
> 
> mintty% echo "-\xF0\x90\x80\x81-"
> -???-
> rxvt% echo "-\xF0\x90\x80\x81-"
> -???-
> 
> Also ok on `ls`:
> 
> % cat s.c
> int main() {
>     fopen("a-\xF0\x90\x80\x81", "w");
>     return 0;
> }
> % ./s
> % ls -l|fgrep a-
> -rw-r--r-- 1 lapo None     0 22 Sep 06:50 a-???

Uh, I see.  That occurs in the normal Windows console.  This is not
Cygwin's fault.  Cygwin's console code converts the multibyte string to
the WCHAR representation and prints it to the console using the
WriteConsoleW function.  That function prints two blocks/question marks
for a surrogate pair.  Look at the file in a cmd shell, it will also
print two blocks/question marks for the surrogate pair.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright 2019   by DJ Delorie     Updated Jul 2019