X-Recipient: archive-cygwin AT delorie DOT com X-Spam-Check-By: sourceware.org Date: Fri, 15 May 2009 11:51:03 +0200 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: [1.7] bug in printf and %ls Message-ID: <20090515095103.GI21324@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.19 (2009-02-20) Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On May 15 13:30, Alexey Borzenkov wrote: > [...] > It appears that there's a bug in printf with %ls that > will refuse to print the string completely if the wide string for %ls > cannot be represented in current charset. It's interesting that > sometimes it behaves differently. For example: > > $ mkpasswd -C > NDGAMES\aborzenkov:unused:11721:10513:U-NDGAMES\aborzenkov,*sidremoved*:/home/aborzenkov:/bin/bash > $ mkgroup -C > NDGAMES\ > > Notice that in the second case it somehow managed to print domain name > and separator before failing. > > Another example: > > #include > #include > > int main(int argc, char** argv) > { > setlocale(LC_ALL, "en_US.CP1252"); > printf("'%ls'", L"\u0410\u0411\u0412"); > return 0; > } > > Prints nothing, i.e. it doesn't print neither of single quotes. If it > couldn't represent those characters, I think it should either ignore > them, or try to display them with SO-UTF-8. Making printf call fail > like that is, imho, really unexpected. printf must not decide by itself over the charset to use for the widechar to multibyte conversion. If you run the same on Linux, you also get a broken output. It only manages to print the leading quoting char. It does not print the second quoting char, because the mbtowc conversion failed. If you check the return code of printf, you see why: if (printf("'%ls'xxx", L"\u0410\u0411\u0412") < 0) perror ("\nprintf"); prints "printf: Invalid or incomplete multibyte or wide character" on Linux as well as on Cygwin. I'll change mkgroup and mkpasswd to call setlocale and to fall back to UTF-8 if the locale is "C". Corinna -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Project Co-Leader cygwin AT cygwin DOT com Red Hat -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/