Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com> List-Archive: <http://sources.redhat.com/ml/cygwin/> List-Post: <mailto:cygwin AT cygwin DOT com> List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs> Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-Id: <5.2.0.9.2.20030213182750.01e97e98@pop3.cris.com> X-Sender: rrschulz AT pop3 DOT cris DOT com Date: Thu, 13 Feb 2003 18:33:35 -0800 To: cygwin AT cygwin DOT com From: Randall R Schulz <rrschulz AT cris DOT com> Subject: Re: Wget ignores robot.txt entry In-Reply-To: <3E4C511E.9060800@serv.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Lowell, What's in your "~/.wgetrc" file? If it contains this: robots = off Then wget will not respect a "robots.txt" file on the host from which it is retrieving files. Before I learned of this option (accessible _only_ via this directive in the .wgetrc file), I did something too clever by half to get robots.txt ignored, so I know that wget does respect it. Randall Schulz At 18:14 2003-02-13, L Anderson wrote: >Using the latest of things Cygwin, I downloaded some stuff with wget >from <http://cygwin.com> to peruse off-line and noticed a problem I >can't explain: > >The <http://cygwin.com/robots.txt> file has the entries: > >User-agent: * >Disallow: /snapshots/ >Disallow: /cgi-bin/ >Disallow: /cgi2-bin/ > >so wget should not download /cgi-bin/. > >However, "wget -o cygwincom.log -m -p --no-parent -X /cygwin,/ml >http://cygwin.com/" downloads /cgi-bin anyway. > >NB. "wget -o cygwincom.log -m -p --no-parent -X /cgi-bin,/cygwin,/ml >http://cygwin.com/ doesn't download /cgi-bin > >I ran a validity check on <http://cygwin.com/robots.txt> and found no errors. > >Is this a bug in wget or am I doing something wrong? > >Thanks, > >Lowell Anderson -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/