delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2003/02/13/21:34:01

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Message-Id: <5.2.0.9.2.20030213182750.01e97e98@pop3.cris.com>
X-Sender: rrschulz AT pop3 DOT cris DOT com
Date: Thu, 13 Feb 2003 18:33:35 -0800
To: cygwin AT cygwin DOT com
From: Randall R Schulz <rrschulz AT cris DOT com>
Subject: Re: Wget ignores robot.txt entry
In-Reply-To: <3E4C511E.9060800@serv.net>
Mime-Version: 1.0

Lowell,

What's in your "~/.wgetrc" file? If it contains this:

robots = off

Then wget will not respect a "robots.txt" file on the host from which 
it is retrieving files.

Before I learned of this option (accessible _only_ via this directive 
in the .wgetrc file), I did something too clever by half to get 
robots.txt ignored, so I know that wget does respect it.

Randall Schulz


At 18:14 2003-02-13, L Anderson wrote:
>Using the latest of things Cygwin, I downloaded some stuff with wget 
>from <http://cygwin.com> to peruse off-line and noticed a problem I 
>can't explain:
>
>The <http://cygwin.com/robots.txt> file has the entries:
>
>User-agent: *
>Disallow: /snapshots/
>Disallow: /cgi-bin/
>Disallow: /cgi2-bin/
>
>so wget should not download /cgi-bin/.
>
>However, "wget -o cygwincom.log -m -p --no-parent -X /cygwin,/ml 
>http://cygwin.com/" downloads /cgi-bin anyway.
>
>NB. "wget -o cygwincom.log -m -p --no-parent -X /cgi-bin,/cygwin,/ml 
>http://cygwin.com/ doesn't download /cgi-bin
>
>I ran a validity check on <http://cygwin.com/robots.txt> and found no errors.
>
>Is this a bug in wget or am I doing something wrong?
>
>Thanks,
>
>Lowell Anderson


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019