Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Delivered-To: mailing list cygwin@cygwin.com Message-ID: <032401c2d3d2$a0ce1b10$78d96f83@pomello> From: "Max Bowsher" To: References: <5.2.0.9.2.20030213182750.01e97e98@pop3.cris.com> Subject: Re: Wget ignores robot.txt entry Date: Fri, 14 Feb 2003 02:41:51 -0000 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Randall R Schulz wrote: > Lowell, > > What's in your "~/.wgetrc" file? If it contains this: > > robots = off > > Then wget will not respect a "robots.txt" file on the host from which > it is retrieving files. > > Before I learned of this option (accessible _only_ via this directive > in the .wgetrc file) Or, on the command line -erobots=off :-) Whilst this does control whether wget downloads robots.txt, a quick test confirms that even when it does get robots.txt, it still wanders into cgi-bin. I'd suggest taking this to the wget list, except wget it currently maintainer-less, and, it appears, bitrotted. Max. -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/