delorie.com/archives/browse.cgi | search |
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
List-Archive: | <http://sources.redhat.com/ml/cygwin/> |
List-Post: | <mailto:cygwin AT cygwin DOT com> |
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs> |
Sender: | cygwin-owner AT cygwin DOT com |
Mail-Followup-To: | cygwin AT cygwin DOT com |
Delivered-To: | mailing list cygwin AT cygwin DOT com |
Message-ID: | <032401c2d3d2$a0ce1b10$78d96f83@pomello> |
From: | "Max Bowsher" <maxb AT ukf DOT net> |
To: | <cygwin AT cygwin DOT com> |
References: | <5 DOT 2 DOT 0 DOT 9 DOT 2 DOT 20030213182750 DOT 01e97e98 AT pop3 DOT cris DOT com> |
Subject: | Re: Wget ignores robot.txt entry |
Date: | Fri, 14 Feb 2003 02:41:51 -0000 |
MIME-Version: | 1.0 |
X-Priority: | 3 |
X-MSMail-Priority: | Normal |
X-MimeOLE: | Produced By Microsoft MimeOLE V6.00.2800.1106 |
Randall R Schulz wrote: > Lowell, > > What's in your "~/.wgetrc" file? If it contains this: > > robots = off > > Then wget will not respect a "robots.txt" file on the host from which > it is retrieving files. > > Before I learned of this option (accessible _only_ via this directive > in the .wgetrc file) Or, on the command line -erobots=off :-) Whilst this does control whether wget downloads robots.txt, a quick test confirms that even when it does get robots.txt, it still wanders into cgi-bin. I'd suggest taking this to the wget list, except wget it currently maintainer-less, and, it appears, bitrotted. Max. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
webmaster | delorie software privacy |
Copyright © 2019 by DJ Delorie | Updated Jul 2019 |