X-Recipient: archive-cygwin AT delorie DOT com DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:reply-to:message-id :references:in-reply-to:mime-version:content-type :content-transfer-encoding; q=dns; s=default; b=d36Akp6HSXuHOQ44 5S1ooy2mO6Lne3eufxn6xwTyE4VncR/oCqnU+o3nqJnGLl8Q5orflrLxcsKRQhQC BzOhvXu3QRlmjC8YYV9GKnOrTG7gZQ8p4og9J3Ck0oFREZzqbOq3+G1hIxOJrkZM EOsvnjTBx9Sr4dZrjlTF7pvzG8E= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:subject:date:reply-to:message-id :references:in-reply-to:mime-version:content-type :content-transfer-encoding; s=default; bh=7BNeBljXO+J4oanSfg+1sy J2XEw=; b=KJR8YLY6XGyFa0ubr2v0KhieFJm8J7fjqxvuGW0Yeq7B+ysnSuq0S0 XW2A6LT6AX7wEzxp9MV/XMSugNs0dY2HgeMZCIBuscwOmc6OuXk6sbpMub3KTik6 H8KGAvB47QiG/0vtfnTfr7Vdqc7pfxo0cdS3g+21DW4tsFXuoGpGc= Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.2 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 X-HELO: homiemail-a108.g.dreamhost.com From: Bengt Larsson To: cygwin AT cygwin DOT com Subject: Re: grep treating my text files as binary! Date: Sat, 27 Dec 2014 11:07:27 +0100 Reply-To: cygwin AT cygwin DOT com Message-ID: References: <549B4258 DOT 5050509 AT redhat DOT com> <549C5A6B DOT 2000509 AT towo DOT net> <27CE6A0A-9845-4A1C-A0F8-C0236B95A1E3 AT etr-usa DOT com> In-Reply-To: <27CE6A0A-9845-4A1C-A0F8-C0236B95A1E3@etr-usa.com> User-Agent: ForteAgent/7.20.32.1218 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 X-IsSubscribed: yes Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by delorie.com id sBRA7lUi008098 Warren Young wrote: >On Dec 25, 2014, at 11:41 AM, Thomas Wolff wrote: > >> In any case the argument is quite artificial since the new behaviour >> hits many files that are in fact text files. > >Please define the term “text file” in a way that allows a C programmer >to write a program that automatically does the correct thing for all >members of the class “text file” without involving locales, or an >equivalent mechanism. ... >If grep runs into a byte sequence that makes it think it is not legal >for your current locale, it must treat the file as raw bytes, unless you >give it -a. > >If you don’t like this behavior, say “alias grep=grep -a” in your >~/.bashrc, and forget the change ever happened. It’ll be on you when >some non-text file gets treated as text and grep spams your terminal >with binary garbage, though. It's better to use the "alias grep='LC_ALL=C grep'" method. It keeps the old way of detecting binaries (for example it detects an .EXE as binary) while allowing you to match mostly-ASCII files with some mismatched-locale characters. The definition you ask for is already in the code. For us non-english people detecting what is "mostly ASCII" is mostly right, at least interactively. I ran into this, actually. I keep a list of my directories and it is in CP1252 for reasons of interfacing with CMD.EXE. Suddenly grep couldn't match it. But I figured something was up and set my locale to CP1252 and then it worked. -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple