delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2016/02/05/10:48:13

X-Recipient: archive-cygwin AT delorie DOT com
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:from:subject:to:message-id:date:mime-version
:content-type:content-transfer-encoding; q=dns; s=default; b=UhF
UxWGk1VXT0xIv7bV45wOw9QUfFGtNnzwle2OeukoznUUDgYB95XZJHCZJJYaUUns
29O+S/Sgfr3h/aXGUar0YUVqXmQE/Pi4ZgNyaDfTR3vWtv08FPSKRpGo+BG8oYEf
dPb9ZGNezplYbBnaH2SJY/vsaSORtSkFoPymaxkg=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
:list-unsubscribe:list-subscribe:list-archive:list-post
:list-help:sender:from:subject:to:message-id:date:mime-version
:content-type:content-transfer-encoding; s=default; bh=kVp4d2OHb
fyBoNv+Z/MnwxQYvPQ=; b=fiT7Dfm88zBspROvUE/iNSJgpUjcuFsMTGI1Y+gQK
o/EkzbYmKmC9SYCDjYMMDIEJ66/YUh1yHkFvtuJvujPjPn722qC0Wl23ARyoy4l5
JdDJU8cEnc0ujWUhc2kQTbDSts+EK0CxZ7x0SCrGfk9U6w7uuYuB31cGP64xAGe4
9M=
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY autolearn=no version=3.3.2 spammy=bh, H*r:8.12.11, Hx-languages-length:1777, cygwin.h
X-HELO: demumfd001.nsn-inter.net
From: Thomas Wolff <towo AT towo DOT net>
Subject: cygwin_conv_ functions and character encoding
To: cygwin AT cygwin DOT com
Message-ID: <56B4C40A.4060607@towo.net>
Date: Fri, 5 Feb 2016 16:47:22 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1
MIME-Version: 1.0
X-purgate-type: clean
X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de
X-purgate: clean
X-purgate: This mail is considered clean (visit http://www.eleven.de for further information)
X-purgate-size: 1823
X-purgate-ID: 151667::1454687270-00001C0B-241601AA/0/0
X-IsSubscribed: yes

The cygwin path conversion functions ignore the current locale;
rather they seem to always use the locale environment set when the 
program was started, see test program convloc.c:

#include <locale.h>
#include <stdio.h>
#include <sys/cygwin.h>
#include <stdlib.h>
int main() {
   setlocale(LC_ALL, "C.UTF-8");
   char * utfstring = "böh";
   printf("ustring <%s>\n", utfstring);
   wchar_t * wstring = cygwin_create_path(CCP_POSIX_TO_WIN_W, utfstring);
   printf("wstring <%ls>\n", wstring);
}

Run in a UTF-8 terminal:
 > LC_CTYPE=de_DE ./convloc
ustring (C.UTF-8) <böh>
wstring (C.UTF-8) <D:\TEMP\böh>

In sys_wcstombs in strfuncs.cc I see:
   const char *charset = cygheap->locale.charset;
which is set in internal_setlocale ()...

In fact, the situation can be fixed by adding after setlocale():
   cygwin_internal(CW_INT_SETLOCALE);  // -> internal_setlocale();
(cf. https://sourceware.org/ml/cygwin-developers/2010-02/msg00054.html)
but I think those functions should use the proper locale implicitly; 
according to the generic description in 
http://linux.die.net/man/3/setlocale,
LC_CTYPE affects ... conversion ... functions, in my opinion this would 
include cygwin-specific conversion functions as well as implicitly 
called conversion (see open() below).
The same problem applies to the open() function (involving path conversion).
The wide string function mbstowcs behaves as expected.


The whole issue occurred to me while trying to work around a missing 
conversion functionality, just converting the pathname syntax between 
Unicode strings. The desired options would be like:
   CCP_POSIX_W_TO_WIN_W,   /* from is wchar_t *posix, to is wchar_t 
*win32  */
   CCP_WIN_W_TO_POSIX_W,   /* from is wchar_t *win32, to is wchar_t 
*posix  */

------
Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019