X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Tue, 22 Sep 2009 11:57:39 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: non-BMP character width Message-ID: <20090922095739.GS20981@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <200909161148 DOT n8GBm4ha001469 AT mail DOT bln1 DOT bf DOT nsn-intra DOT net> <20090921163348 DOT GL20981 AT calimero DOT vinschen DOT de> <20090921175759 DOT GM20981 AT calimero DOT vinschen DOT de> <4AB8592F DOT 9060803 AT lapo DOT it> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4AB8592F.9060803@lapo.it> User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Sep 22 06:57, Lapo Luchini wrote: > Corinna Vinschen wrote: > > Sure. I was specificially asking for a testcase, preferrably in > > plain C, which allows to reproduce this under a debugger. > > Actually, I can't reproduce that, but I guess it's a problem of the > specific console he's using (Thomas, which one is that?): on mintty it > works ok (I'm not really sure it outputs U+10001, but it surely shows a > single box) and on rxvt it just shows as four ISO-8859-1 chars: > (es expected, as native rxvt doesn't support Unicode) > > mintty% echo "-\xF0\x90\x80\x81-" > -???- > rxvt% echo "-\xF0\x90\x80\x81-" > -???- > > Also ok on `ls`: > > % cat s.c > int main() { > fopen("a-\xF0\x90\x80\x81", "w"); > return 0; > } > % ./s > % ls -l|fgrep a- > -rw-r--r-- 1 lapo None 0 22 Sep 06:50 a-??? Uh, I see. That occurs in the normal Windows console. This is not Cygwin's fault. Cygwin's console code converts the multibyte string to the WCHAR representation and prints it to the console using the WriteConsoleW function. That function prints two blocks/question marks for a surrogate pair. Look at the file in a cmd shell, it will also print two blocks/question marks for the surrogate pair. Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple