X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id :references:to; q=dns; s=default; b=QfiFyb/0p+WuHyfr4KBQJ4Z1lmPl B7geJB1KY2cyIcS67P5RV9SwgM+lZ0v4+HwTG+9k6jIQIQquNn0u5jIR54j0RkXR R1mmk49mwP/Ntgdyx8kACPcmzyuF7ST2dKcV2ll2baIdC88eaO/qLBzAoahCFL1G 5iMnIteFTLN2pNo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:content-type:mime-version:subject:from :in-reply-to:date:content-transfer-encoding:message-id :references:to; s=default; bh=++lvGGtq+qTpQbkjXH060saji/4=; b=um qejv6/wxMBUVcr4UfqNo5NiDDzfV2LeRobuDOVT+XCgt9UNdlkl0O2HZaw+2antl wPzhP/NyOkS1USsUuhk6aaQ5C4a+UZW8RvEbLjdWxCdQ1vI/O8DB9SuXF2kQF/tI O7R+ovnyCl/v0UGUIa8FBdl11o9geHQGArqT8eE9A= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.6 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 X-HELO: gproxy8-pub.mail.unifiedlayer.com X-Authority-Analysis: v=2.1 cv=Zox+dbLG c=1 sm=1 tr=0 a=x/h8IXy5FZdipniTS+KQtQ==:117 a=x/h8IXy5FZdipniTS+KQtQ==:17 a=cNaOj0WVAAAA:8 a=f5113yIGAAAA:8 a=IkcTkHD0fZMA:10 a=z1iSbGl3AAAA:8 a=CnPQkyIfcMwA:10 a=rD4U560VbWoA:10 a=h1PgugrvaO0A:10 a=20KFwNOVAAAA:8 a=WYcy3mCKFWwyspbR7_MA:9 a=QEXdDO2ut3YA:10 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2098\)) Subject: Re: Grepping Unicode files? From: Vince Rice In-Reply-To: <5554D09B.3030209@redhat.com> Date: Thu, 14 May 2015 12:14:20 -0500 Message-Id: <47AFF066-46C5-41FA-A99B-F53C680DF09A@solidrocksystems.com> References: <3C280897-291A-4A8C-8C3F-46D1D9BEFCFE AT solidrocksystems DOT com> <746170827 DOT 20150514185648 AT yandex DOT ru> <313678DD-A000-4F82-A015-836B882C09FC AT solidrocksystems DOT com> <5554D09B DOT 3030209 AT redhat DOT com> To: cygwin AT cygwin DOT com X-Identified-User: {3986:box867.bluehost.com:solidrr2:solidrocksystems.com} {sentby:smtp auth 65.118.57.199 authed with vrice AT solidrocksystems DOT com} X-IsSubscribed: yes Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id t4EHEkH1029580 > On May 14, 2015, at 11:43 AM, Eric Blake wrote: > > On 05/14/2015 10:32 AM, Vince Rice wrote: > > … >> >> Now, pardon my continued ignorance, but which of those variables needs to be set to UTF16 in order for grep to work? And I assume it (they?) should be set to en_US.UTF-16? > > None. UTF16 is not a valid locale. It is a valid encoding (wide > character), but locales must operate on multi-byte sequences, not wide > characters. So you HAVE to convert from wide character to multi-byte > before you can do anything that requires a locale to work correctly. Oh my, the rabbit-hole gets deeper. I don’t know the difference between wide character and multi-byte. A little searching appears to indicate that Unicode is a type of wide-character, while multi-byte is … well, I still don’t know what multi-byte is. :) But, we’re definitely out in the weeds of non-cygwinness here, and my file is UTF16, so I can learn what multi-byte is and the difference later. Bottom-line… >> >> Thanks to everyone for your help. I think you’ve all confirmed this isn’t cygwin-specific, but I couldn’t find anything even searching generically (“grep unicode” and now “grep utf16”). I did finally find an external reference to iconv, but if grep is supposed to be handle this natively, I haven’t been able to find much on how to do it. > > grep cannot handle UTF16 natively. iconv exists to do encoding > transformations, so that the rest of the system can live in multi-byte > world instead of worrying about wide-character encodings. … grep can’t handle unicode files. Good to know. iconv it is. Thanks again! -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple