| delorie.com/archives/browse.cgi | search |
| Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm |
| List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com> |
| List-Archive: | <http://sources.redhat.com/ml/cygwin/> |
| List-Post: | <mailto:cygwin AT cygwin DOT com> |
| List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sources.redhat.com/ml/#faqs> |
| Sender: | cygwin-owner AT cygwin DOT com |
| Mail-Followup-To: | cygwin AT cygwin DOT com |
| Delivered-To: | mailing list cygwin AT cygwin DOT com |
| Message-ID: | <3E4C511E.9060800@serv.net> |
| Date: | Thu, 13 Feb 2003 18:14:54 -0800 |
| From: | L Anderson <lowella AT serv DOT net> |
| Organization: | TBD |
| User-Agent: | Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.2) Gecko/20021120 Netscape/7.01 |
| X-Accept-Language: | en,ru |
| MIME-Version: | 1.0 |
| To: | cygwinList <cygwin AT cygwin DOT com> |
| Subject: | Wget ignores robot.txt entry |
Using the latest of things Cygwin, I downloaded some stuff with wget from <http://cygwin.com> to peruse off-line and noticed a problem I can't explain: The <http://cygwin.com/robots.txt> file has the entries: User-agent: * Disallow: /snapshots/ Disallow: /cgi-bin/ Disallow: /cgi2-bin/ so wget should not download /cgi-bin/. However, "wget -o cygwincom.log -m -p --no-parent -X /cygwin,/ml http://cygwin.com/" downloads /cgi-bin anyway. NB. "wget -o cygwincom.log -m -p --no-parent -X /cgi-bin,/cygwin,/ml http://cygwin.com/ doesn't download /cgi-bin I ran a validity check on <http://cygwin.com/robots.txt> and found no errors. Is this a bug in wget or am I doing something wrong? Thanks, Lowell Anderson -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Bug reporting: http://cygwin.com/bugs.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/
| webmaster | delorie software privacy |
| Copyright © 2019 by DJ Delorie | Updated Jul 2019 |