X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:references:message-id:date:from:reply-to :subject:to:in-reply-to:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=D6mYS/r67+hrwbQM jte5bGQlUuB2BGPg1O12LP2hWbWoqY6dHxTYuAC3Vy6sE+kXvsBQjvIC9K3k92CU Pk/kpGyHQLDJewneK7e/h/C2gGB9+3PXU6WcHTx8F6fRqIIUfkLZo/+I7OQGaYDU T/l5WMdholEfvorGVN0UHgpfIyI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:references:message-id:date:from:reply-to :subject:to:in-reply-to:mime-version:content-type :content-transfer-encoding; s=default; bh=Mfx0Becg3dKSBqvtNWnH+2 Whlps=; b=uOwXf407mEzcRK0vvQN8pFdduAYJsLmHtWjtnRV4pPC9xtenvZFhVV RymbHR8v0nbEq8oFWeGMaMcsrvdM3d7S1DNcpXxKR5VrJjJzwaTutJoHA1k+YuZP 9RFtvKTl8bD+PqVuwquOoKMe1o2hwlsOvjQGUrgt7WIN8pwGyoYd8= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: Yes, score=6.5 required=5.0 tests=AWL,BAYES_50,BODY_8BITS,FREEMAIL_FROM,GARBLED_BODY,RCVD_IN_DNSWL_NONE,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 X-HELO: nm31-vm0.bullet.mail.ne1.yahoo.com References: <1414818040 DOT 70941 DOT YahooMailNeo AT web122101 DOT mail DOT ne1 DOT yahoo DOT com> <1414818837 DOT 68078 DOT YahooMailNeo AT web122102 DOT mail DOT ne1 DOT yahoo DOT com> <1414900169 DOT 77776 DOT YahooMailNeo AT web122102 DOT mail DOT ne1 DOT yahoo DOT com> Message-ID: <1414935174.63302.YahooMailNeo@web122102.mail.ne1.yahoo.com> Date: Sun, 2 Nov 2014 05:32:54 -0800 From: Brent Reply-To: Brent Subject: Re: bug/deficiency in zip: non-ascii chars in file names work, but fail in directory names To: "cygwin AT cygwin DOT com" In-Reply-To: <1414900169.77776.YahooMailNeo@web122102.mail.ne1.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id sA2DaAD3028310 Doug Henderson wrote: "You need to add the -r option to recurse into directories:" You are 100% correct; my oversight. Actually, it was a copy and paste error: the real code that I want to test does use -r, but when I tried to adapt that code to a simpler format for my email, I accidentally dropped the -r. The code that I really want to test fails with a different error, so you solved a mystery that was really bugging me: why the console code in my email behaved differently from the test code I really care about. I returned to analysing my real test code more carefully, and I still see a problem with cygwin's unzip: it fails to extract zip files with unicode names that are produced by OTHER programs (i.e. some other program besides cygwin zip). In particular, one part of my test code creates a zip archive using Java (ZipOutputStream and ZipEntry), and then confirms that the archive can be extracted and exactly reproduced by multiple other means. The first extraction method is to again use Java (ZipFile and ZipEntry); this works perfectly, as it should. The second extraction method is to use cygwin's unzip; this fails: IT MANGLES THE NAMES. In particular: 1) the directory should be åØâéñ (\u00E5\u00D8\u00E2\u00E9\u00F1) 2) the file should be 㐀丁龦豈侮_file#2_length2048.txt (first 5 chars \u3400\u4E01\u9FA6\uF900\uFA30) but what cygwin unzip actually produces during extraction is 1) the directory is +++++ 2) the file is ڥǴ_file#2_length2048.txt To rule out Java as being non-standard, I manually took the zip archive it produced and extracted it using the latest 7-zip (9.20), which worked perfectly (the directory and file names came out exact). To further verify, I also temporarily installed the latest WinZip (19.0 build 11293) and once again, it extracted Java's zip file with non-ASCII names perfectly. If anyone wants to verify these claims, I am attaching the zip file produced by Java (and extractable by 7zip and WinZip, but NOT by cygwin unzip) to this email. [UPDATE: my original email yesterday had this attachment, but I do not see it showing up on the mailing list. I take it that cygwin mailing lists auto reject emails with attachments?] So, I reckon that cygwin unzip is the odd man out. Oh, when I try to view this zip file using Windows 7's integrated zip viewed in Windows Explorer, it displays mangled directory and file names that are something different still from what cygwin unzip produced. This link https://www.jam-software.com/treesize/online_manual/EN/unicode_zip_files.html claims that Windows 7 does not really support unicode names, so this is perhaps expected. Also, I found that this inter-program compatibility is limited to cygwin unzip: cygwin zip seems to produce archives involving unicode names that other programs can extract just fine. I did some web research, and the most relevant link that I could find about cygwin unzip and unicode is this old announcement from 2009: https://cygwin.com/ml/cygwin-announce/2009-08/msg00006.html That announcement contains this ominous text: Currently, on Windows the UTF-8 handling is limited to the character subset contained in the configured non-unicode "system code page". Is it possible that the deficiency mentioned above has simply not been fixed in the last 5 years? -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple