delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/05/15/05:51:31

X-Recipient: archive-cygwin AT delorie DOT com
X-Spam-Check-By: sourceware.org
Date: Fri, 15 May 2009 11:51:03 +0200
From: Corinna Vinschen <corinna-cygwin AT cygwin DOT com>
To: cygwin AT cygwin DOT com
Subject: Re: [1.7] bug in printf and %ls
Message-ID: <20090515095103.GI21324@calimero.vinschen.de>
Reply-To: cygwin AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
References: <e2480c70905150230y595b5796wa79c5b34df707fbf AT mail DOT gmail DOT com>
MIME-Version: 1.0
In-Reply-To: <e2480c70905150230y595b5796wa79c5b34df707fbf@mail.gmail.com>
User-Agent: Mutt/1.5.19 (2009-02-20)
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT cygwin DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On May 15 13:30, Alexey Borzenkov wrote:
> [...]
>  It appears that there's a bug in printf with %ls that
> will refuse to print the string completely if the wide string for %ls
> cannot be represented in current charset. It's interesting that
> sometimes it behaves differently. For example:
> 
> $ mkpasswd -C
> NDGAMES\aborzenkov:unused:11721:10513:U-NDGAMES\aborzenkov,*sidremoved*:/home/aborzenkov:/bin/bash
> $ mkgroup -C
> NDGAMES\
> 
> Notice that in the second case it somehow managed to print domain name
> and separator before failing.
> 
> Another example:
> 
> #include <stdio.h>
> #include <locale.h>
> 
> int main(int argc, char** argv)
> {
>   setlocale(LC_ALL, "en_US.CP1252");
>   printf("'%ls'", L"\u0410\u0411\u0412");
>   return 0;
> }
> 
> Prints nothing, i.e. it doesn't print neither of single quotes. If it
> couldn't represent those characters, I think it should either ignore
> them, or try to display them with SO-UTF-8. Making printf call fail
> like that is, imho, really unexpected.

printf must not decide by itself over the charset to use for the widechar
to multibyte conversion.  If you run the same on Linux, you also get a
broken output.  It only manages to print the leading quoting char.  It
does not print the second quoting char, because the mbtowc conversion
failed.  If you check the return code of printf, you see why:

  if (printf("'%ls'xxx", L"\u0410\u0411\u0412") < 0)
    perror ("\nprintf");

prints "printf: Invalid or incomplete multibyte or wide character"
on Linux as well as on Cygwin.

I'll change mkgroup and mkpasswd to call setlocale and to fall back to
UTF-8 if the locale is "C".


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019