X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; q=dns; s= default; b=TZxK2yzKwqAWx3LaJc4IxOrQ+m72qVBNsIy4lSFVsbUq8zHiu1tWb /Jrvc8kd8J72zlN3xGET09zgXO52s3iDJeelOkQCo/anSH7Nal+JclTFpT21NDPQ 9mVnowUk8GgXtP20HXNVnVwKwlLszNHcEXsoMeJLeOQUD+AQddih9A= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:subject:message-id:reply-to :references:mime-version:content-type:in-reply-to; s=default; bh=0WNOU5InFoP8qtBfm5AELyW4Ugc=; b=bC7qJYNwmppPKBRcC8P/YLJfrmKL WpRZlJbIZcNCt2566WJmMEve6l2mHNQlURZRzPUjr1vDbBfqyL6ozA01Dcjb2qav sDyry3wL2jf2iFQnpgbA93XUI0BHX8nZkZ5y8/uZ9fy0eLK+pk5jZKmVE1vB1ZtM kuJ52ZfYPoN+nUs= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-97.9 required=5.0 tests=AWL,BAYES_50,BODY_8BITS,GARBLED_BODY,KAM_LAZY_DOMAIN_SECURITY,KHOP_DYNAMIC,RCVD_IN_PBL,RDNS_DYNAMIC,USER_IN_WHITELIST autolearn=no version=3.3.2 spammy=8:=d0=b0, 8:=d0=b2, Latin, 8:=d0=b1?= X-HELO: calimero.vinschen.de Date: Thu, 24 Dec 2015 20:24:48 +0100 From: Corinna Vinschen To: cygwin AT cygwin DOT com Subject: Re: stat() lstat() not able to read long filename with cyrillic chars? Message-ID: <20151224192448.GB4275@calimero.vinschen.de> Reply-To: cygwin AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com References: <20151223194440 DOT 5B2A98CFEA AT edrusb DOT is-a-geek DOT org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="jho1yZJdad60DJr+" Content-Disposition: inline In-Reply-To: <20151223194440.5B2A98CFEA@edrusb.is-a-geek.org> User-Agent: Mutt/1.5.24 (2015-08-30) --jho1yZJdad60DJr+ Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Dec 23 20:44, Denis Corbin wrote: > Hi, >=20 > First, I have read the FAQ and this mailing archive :) >=20 > Here is the problem I meet: >=20 > In a directory are placed three files using windows 8's explorer: > - a short Cyrillic filename "=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1.txt" > - a long Cyrillic filename > "=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2= =D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0= =B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0= =D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0= =B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1= =D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0= =B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2= =D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0= =B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0= =D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0=B1=D0=B2=D0=B0=D0= =B1=D0=B2=D0=B0=D0=B1.txt" > - a long Latin filename > "abababababababababababababababababababababababababababababababababababab= ababababababababababababababababababababababababa.txt" >=20 >=20 > >From a C program compiled under Cygwin, I can obtain the corresponding > filename strings using readdir_r()... >=20 > "\320\260\320\261\320\262\320\260\320\261.txt" > "\320\260\320\261\320\262\320\260\320\261\320\262\320\260\320\261 [snippe= d]" > "abababababaababababa [snipped]" >=20 > ... but passing these strings in turn to lstat() or stat() returns 0 as > expected for all except for the long Cyrillic filename. NAME_MAX is 255. On Windows this is the number of UTF-16 chars unfortunately. On POSIX systems (as on Cygwin) this is the number of bytes. Long UTF-16 strings in cyrillic take twice as much UTF-8 chars as it has UTF-16 chars, so NAME_MAX in utf-8 cyrillics translates into a maximum of 127 UTF-16 chars. If you need access to UTF-16 filenames with more characters, you can switch to a one-byte charset temporarily, e.g. $ LC_ALL=3Dru_RU your_app to switch to iso-8859-5 or $ LC_ALL=3Dru_RU.CP1251 to switch to Windows codepage 1251. See https://cygwin.com/cygwin-ug-net/setup-locale.html HTH, Corinna --=20 Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat --jho1yZJdad60DJr+ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJWfEaAAAoJEPU2Bp2uRE+gEgQQAI+spUa6hmiMYeCl4PM1SHJU SHhEUdoznlTd0UFlrbR9F8YIUbWJ/HospXR5Q46SKRxs1RVp1dJ76kYXgKERcW9t ++iS6WU5/CSph3mobFq13IplAK9APnBD2WwO3Mf0V52kbY6vWjbhCsbJgqqyEFu0 7qREZ5UDKsAU2YcP74JII1pl5t1wEnXqt5nmuVxvPWvciGcL9RBUxw1mTuXOOvxp RVIngiyW5VD/G9Ual6FdxxQgDFu+0VotwQl9mGLzePV/0LfbtsMynnKhAyqTf4jn S4RrUJoM6gT1BzVHwEasKkocT/MvvFlquboBgQQoI1w4f+DgKJGgpoaJJeFp8zh/ u3J4YEPfVhx54SKPGBmxcjMMNiF6C4asxc6DbFPNylby13QAJ9hHb/8SqbKhvOKv s8RucuMs1u2btdU6+r82mfaECHv4WJjOrdcSqk3sphFuOqtYrEBkDU8eoLQZPHHc XTC5lrDJfsv92ekJdkcLJJoyZxZwTjrmsYwd6qhzT7CWAqX5AeJwrUhVJvpEi6tm ltkSvq2JOzbSz+ZmH3fFqVRfKnDsDzxRIxk/bAbeR1tv7/1WqOMU270OSCOC7d/x ARAMr8FlDC9lZj2uGTH5xWauEIYtlw1cA5Mk6T5joRZjPWuec9phIVpTo1In+JM7 rC1sBEqGQuMXGPWasLd9 =8xU0 -----END PGP SIGNATURE----- --jho1yZJdad60DJr+--