delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2003/02/13/21:57:41

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Message-Id: <5.2.0.9.2.20030213185143.01da0ef0@pop3.cris.com>
X-Sender: rrschulz AT pop3 DOT cris DOT com
Date: Thu, 13 Feb 2003 18:57:39 -0800
To: cygwin AT cygwin DOT com
From: Randall R Schulz <rrschulz AT cris DOT com>
Subject: Re: Wget ignores robot.txt entry
In-Reply-To: <032401c2d3d2$a0ce1b10$78d96f83@pomello>
References: <5 DOT 2 DOT 0 DOT 9 DOT 2 DOT 20030213182750 DOT 01e97e98 AT pop3 DOT cris DOT com>
Mime-Version: 1.0

Max,

Right.

How can I have read the wget man page so many times and not have seen 
that? I guess it's 'cause I'm always looking for something specific, 
like the difference between "-o" and "-O".

The only think I hate worse than being wrong is not knowing it (plus 
showing it).

Wget is orphaned? That's bad news, since it seems to have it all over 
cURL. (Sure. Go ahead and prove me wrong. I might as well get it over 
with... for now.)

Randall Schulz


At 18:41 2003-02-13, Max Bowsher wrote:
>Randall R Schulz wrote:
> > Lowell,
> >
> > What's in your "~/.wgetrc" file? If it contains this:
> >
> > robots = off
> >
> > Then wget will not respect a "robots.txt" file on the host from which
> > it is retrieving files.
> >
> > Before I learned of this option (accessible _only_ via this directive
> > in the .wgetrc file)
>
>Or, on the command line -erobots=off :-)
>
>Whilst this does control whether wget downloads robots.txt, a quick test
>confirms that even when it does get robots.txt, it still wanders into
>cgi-bin.
>
>I'd suggest taking this to the wget list, except wget it currently
>maintainer-less, and, it appears, bitrotted.
>
>Max.


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019