Mail Archives: cygwin/2009/10/13/15:54:53
X-Recipient: | archive-cygwin AT delorie DOT com
|
X-SWARE-Spam-Status: | No, hits=-2.5 required=5.0 tests=AWL,BAYES_00,SPF_PASS
|
X-Spam-Check-By: | sourceware.org
|
Message-ID: | <4AD4DE7C.7030606@gmail.com>
|
Date: | Tue, 13 Oct 2009 21:09:32 +0100
|
From: | Dave Korn <dave DOT korn DOT cygwin AT googlemail DOT com>
|
User-Agent: | Thunderbird 2.0.0.17 (Windows/20080914)
|
MIME-Version: | 1.0
|
Followup-To: | The,Off-Topic,And,Nonymous,Cygwin-Talk,Mailing,List,<cygwin-talk AT cygwin DOT com>
|
To: | cygwin AT cygwin DOT com
|
Subject: | [OT] Re: Want to use tor with wget.
|
References: | <3j28d51fso528qi14rpfqcga8r9oqckji8 AT 4ax DOT com> <4AD413B2 DOT 1070903 AT gmail DOT com> <1631589547 DOT 20091013204226 AT gmail DOT com>
|
In-Reply-To: | <1631589547.20091013204226@gmail.com>
|
Mailing-List: | contact cygwin-help AT cygwin DOT com; run by ezmlm
|
List-Id: | <cygwin.cygwin.com>
|
List-Subscribe: | <mailto:cygwin-subscribe AT cygwin DOT com>
|
List-Archive: | <http://sourceware.org/ml/cygwin/>
|
List-Post: | <mailto:cygwin AT cygwin DOT com>
|
List-Help: | <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
|
Sender: | cygwin-owner AT cygwin DOT com
|
Mail-Followup-To: | cygwin AT cygwin DOT com
|
Delivered-To: | mailing list cygwin AT cygwin DOT com
|
Note-from-DJ: | This may be spam
|
[ We're offtopic here since it's not a cygwin-specific issue anymore, so I've
set a follow-up to the cygwin-talk list in case you have further questions or
replies. ]
hongyi.zhao wrote:
> On Tuesday, October 13, 2009 at 13:44, dave.korn.cygwin wrote:
>> Hongyi Zhao wrote:
> I want to use wget to grab the following web page:
>
> http://www.cybersyndrome.net/pla5.html
Then, you can tell wget to use your local privoxy as an http proxy, which is
exactly how your browser relates to it.
export http_proxy=localhost:8118
wget http://www.cybersyndrome.net/pla5.html
should do the trick, but check the wget manual page about proxy support for
full details. (I'm assuming here you're running the usual kind of Tor setup
with a supporting co-installation of Privoxy.)
> OTOH, I've also learned that curl support socks4/5 proxy, and I use
> the following command under my cygwin console:
>
> curl --socks5 127.0.0.1:9050 http://www.cybersyndrome.net/pla5.html
>
> But I meet the following error:
>
> -----------------------------
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <HTML><HEAD>
> <TITLE>302 Found</TITLE>
> </HEAD><BODY>
> <H1>Found</H1>
> The document has moved <A HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40
> 4.html">here</A>.<P>
> </BODY></HTML>
> -----------------------------
That's interesting. A real 302 redirect would have an actual 302 status
code and a Location header, not just be a 200 returning an html document with
the words "302 Found" and a URL in it.
> Nevertheless, I can use firefox with Tor enabled to access this
> webpage.
>
> What's the reason
It's something the server is doing deliberately, perhaps a malfunctioning or
misguided anti-bot feature of some sort, based on the request headers sent by
the user's agent.
> and how can I grab this webpage just by a
> command-line downloading tool?
Well, you can use wget! Or you can tell your curl to pretend it is wget!
> $ curl 'http://www.cybersyndrome.net/pla5.html'
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <HTML><HEAD>
> <TITLE>302 Found</TITLE>
> </HEAD><BODY>
> <H1>Found</H1>
> The document has moved <A HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40
> 4.html">here</A>.<P>
> </BODY></HTML>
> $ wget 'http://www.cybersyndrome.net/pla5.html'
> --2009-10-13 21:00:36-- http://www.cybersyndrome.net/pla5.html
> Resolving www.cybersyndrome.net... 210.153.118.69
> Connecting to www.cybersyndrome.net|210.153.118.69|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: unspecified [text/html]
> Saving to: `pla5.html'
>
> [ <=> ] 18,151 3.11K/s in 5.7s
>
> 2009-10-13 21:00:42 (3.11 KB/s) - `pla5.html' saved [18151]
> $ curl 'http://www.cybersyndrome.net/pla5.html' -A 'User-Agent: Wget/1.11.4'
> <html>
> <head>
> <meta http-equiv="content-type" content="text/html; charset=Shift_JIS">
> <meta name="robots" content="noarchive">
> <meta name="description" content="â–’â–’â–’pâ–’?\â–’?OVâ–’â–’Proxyâ–’â–’â–’â–’â–’??â–’â–’Jâ–’â–’â–’Aâ–’â–’?â–’â–’B">
> <title>CyberSyndrome : Proxy List / Anonymous</title>
> <style type="text/css">
[ ... snip ... ]
cheers,
DaveK
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
- Raw text -