Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com Message-ID: <032401c2d3d2$a0ce1b10$78d96f83@pomello> From: "Max Bowsher" To: References: <5 DOT 2 DOT 0 DOT 9 DOT 2 DOT 20030213182750 DOT 01e97e98 AT pop3 DOT cris DOT com> Subject: Re: Wget ignores robot.txt entry Date: Fri, 14 Feb 2003 02:41:51 -0000 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Randall R Schulz wrote: > Lowell, > > What's in your "~/.wgetrc" file? If it contains this: > > robots = off > > Then wget will not respect a "robots.txt" file on the host from which > it is retrieving files. > > Before I learned of this option (accessible _only_ via this directive > in the .wgetrc file) Or, on the command line -erobots=off :-) Whilst this does control whether wget downloads robots.txt, a quick test confirms that even when it does get robots.txt, it still wanders into cgi-bin. I'd suggest taking this to the wget list, except wget it currently maintainer-less, and, it appears, bitrotted. Max. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/