delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2003/02/13/21:42:54

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Message-ID: <032401c2d3d2$a0ce1b10$78d96f83@pomello>
From: "Max Bowsher" <maxb AT ukf DOT net>
To: <cygwin AT cygwin DOT com>
References: <5 DOT 2 DOT 0 DOT 9 DOT 2 DOT 20030213182750 DOT 01e97e98 AT pop3 DOT cris DOT com>
Subject: Re: Wget ignores robot.txt entry
Date: Fri, 14 Feb 2003 02:41:51 -0000
MIME-Version: 1.0
X-Priority: 3
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106

Randall R Schulz wrote:
> Lowell,
>
> What's in your "~/.wgetrc" file? If it contains this:
>
> robots = off
>
> Then wget will not respect a "robots.txt" file on the host from which
> it is retrieving files.
>
> Before I learned of this option (accessible _only_ via this directive
> in the .wgetrc file)

Or, on the command line -erobots=off :-)

Whilst this does control whether wget downloads robots.txt, a quick test
confirms that even when it does get robots.txt, it still wanders into
cgi-bin.

I'd suggest taking this to the wget list, except wget it currently
maintainer-less, and, it appears, bitrotted.

Max.


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019