delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2003/02/13/21:15:40

Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sources.redhat.com/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com
Message-ID: <3E4C511E.9060800@serv.net>
Date: Thu, 13 Feb 2003 18:14:54 -0800
From: L Anderson <lowella AT serv DOT net>
Organization: TBD
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01
X-Accept-Language: en,ru
MIME-Version: 1.0
To: cygwinList <cygwin AT cygwin DOT com>
Subject: Wget ignores robot.txt entry

Using the latest of things Cygwin, I downloaded some stuff with wget 
from <http://cygwin.com> to peruse off-line and noticed a problem I 
can't explain:

The <http://cygwin.com/robots.txt> file has the entries:

User-agent: *
Disallow: /snapshots/
Disallow: /cgi-bin/
Disallow: /cgi2-bin/

so wget should not download /cgi-bin/.

However, "wget -o cygwincom.log -m -p --no-parent -X /cygwin,/ml 
http://cygwin.com/" downloads /cgi-bin anyway.

NB. "wget -o cygwincom.log -m -p --no-parent -X /cgi-bin,/cygwin,/ml 
http://cygwin.com/ doesn't download /cgi-bin

I ran a validity check on <http://cygwin.com/robots.txt> and found no 
errors.

Is this a bug in wget or am I doing something wrong?

Thanks,

Lowell Anderson



--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Bug reporting:         http://cygwin.com/bugs.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019