delorie.com/archives/browse.cgi | search |
X-Recipient: | archive-cygwin AT delorie DOT com |
DomainKey-Signature: | a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:content-type:mime-version:subject:from | |
:in-reply-to:date:content-transfer-encoding:message-id | |
:references:to; q=dns; s=default; b=jCehCsq9Ep2zHI4WpzOfR+kYdYHs | |
doWy3gOrAJMJj9+7CGGqv2T70SF/mqF1upybHWzaOoWohcsdt1UUlbUF6GGeEJi+ | |
f2OHXWNj/gWiHwqgGtCN+1/cA/roL/vqvQxPd/4DSpjYwKHzEX9sjcDeFCfgsJdt | |
L20XotzaE6Vk6QA= | |
DKIM-Signature: | v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id |
:list-unsubscribe:list-subscribe:list-archive:list-post | |
:list-help:sender:content-type:mime-version:subject:from | |
:in-reply-to:date:content-transfer-encoding:message-id | |
:references:to; s=default; bh=oAyzsWwA+CQIpviqhO1ga1SqNlY=; b=Qr | |
5tBxydDmsxYFlc0GZh1o6YWsILqg3h9nD1ZeTLF8yqxc3BL3nFRlSQNxi1l66NML | |
dXd7L3sogsJzCf9BtnwLt1tdOaWDSiUReiATzPqg3bl9VNVNAVWLuv91tygWtFMI | |
V7O6TZ0SVM5Wz7Rsi2LrSkW5mD9VxlSKWWhwEmgDw= | |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Id: | <cygwin.cygwin.com> |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sourceware.org/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Authentication-Results: | sourceware.org; auth=none |
X-Virus-Found: | No |
X-Spam-SWARE-Status: | No, score=2.6 required=5.0 tests=AWL,BAYES_00,KAM_BODY_URIBL_PCCC,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 |
X-HELO: | gproxy2-pub.mail.unifiedlayer.com |
X-Authority-Analysis: | v=2.1 cv=Zox+dbLG c=1 sm=1 tr=0 a=x/h8IXy5FZdipniTS+KQtQ==:117 a=x/h8IXy5FZdipniTS+KQtQ==:17 a=cNaOj0WVAAAA:8 a=f5113yIGAAAA:8 a=IkcTkHD0fZMA:10 a=z1iSbGl3AAAA:8 a=CnPQkyIfcMwA:10 a=rD4U560VbWoA:10 a=h1PgugrvaO0A:10 a=vaJtXVxTAAAA:8 a=j-i6J2Y3ip1HmTdnrYAA:9 a=QEXdDO2ut3YA:10 |
Mime-Version: | 1.0 (Mac OS X Mail 8.2 \(2098\)) |
Subject: | Re: Grepping Unicode files? |
From: | Vince Rice <vrice AT solidrocksystems DOT com> |
In-Reply-To: | <746170827.20150514185648@yandex.ru> |
Date: | Thu, 14 May 2015 11:32:26 -0500 |
Message-Id: | <313678DD-A000-4F82-A015-836B882C09FC@solidrocksystems.com> |
References: | <3C280897-291A-4A8C-8C3F-46D1D9BEFCFE AT solidrocksystems DOT com> <746170827 DOT 20150514185648 AT yandex DOT ru> |
To: | cygwin AT cygwin DOT com |
X-Identified-User: | {3986:box867.bluehost.com:solidrr2:solidrocksystems.com} {sentby:smtp auth 65.118.57.199 authed with vrice AT solidrocksystems DOT com} |
X-IsSubscribed: | yes |
X-MIME-Autoconverted: | from quoted-printable to 8bit by delorie.com id t4EGWrAu026049 |
On May 14, 2015, at 10:56 AM, Andrey Repin <anrdaemon AT yandex DOT ru> wrote: > > Greetings, Vince Rice! > >> uname says "CYGWIN_NT-6.1 machinename 1.7.35(0.287/5/3) 2015-03-04 12:07 i686 Cygwin”. >> I’m running grep 2.21.2, which cygcheck -c says is OK. > >> Does Cygwin’s grep support Unicode files? The output from a SQL Server SQL >> Agent job is a Unicode file, i.e. if you look at it in a hex editor every >> other character is 00 because each character is taking up two bytes. The >> filename itself is fine, it’s the contents that is Unicode. I can’t get grep >> to work on it, either with or without -a. > >> This may not be a Cygwin-specific question, but I haven’t been able to find >> anything after several Google searches, including the archives, and neither >> --help nor the man page for grep references Unicode. > >> By default I have neither LC_ALL nor LC_COLLATE set. > >> A pointer to a better search or a website that explains this would be >> great, or if it can’t currently be done, that’s OK, too. > > grep only treat files as text if they are matching current locale. > Check `locale` output to see your current settings. First, to the other responder(s), running it through iconv with a from of UTF16 and a to of UTF8 did work. Thanks for the pointer. (I’ve never had to deal with anything but ANSI files, so I didn’t know about iconv. And I guessed on the UTF8, given what I found below.) locale run from a cmd.exe session says that everything is “C.UTF-8”, while locale run from mintty says that everything is en_US.UTF-8. A “which” in both cases shows that the locale being run is cygwin’s, so I assume mintty does something slightly differently than the normal console? I don’t even know if there’s a difference. (Have I mentioned I don’t know anything about all of this?) From cmd.exe: LANG= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_ALL= From mintty LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_ALL= Now, pardon my continued ignorance, but which of those variables needs to be set to UTF16 in order for grep to work? And I assume it (they?) should be set to en_US.UTF-16? Thanks to everyone for your help. I think you’ve all confirmed this isn’t cygwin-specific, but I couldn’t find anything even searching generically (“grep unicode” and now “grep utf16”). I did finally find an external reference to iconv, but if grep is supposed to be handle this natively, I haven’t been able to find much on how to do it. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
webmaster | delorie software privacy |
Copyright 2019 by DJ Delorie | Updated Jul 2019 |