X-Recipient: archive-cygwin AT delorie DOT com X-SWARE-Spam-Status: No, hits=-0.5 required=5.0 tests=BAYES_00,RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: sourceware.org To: cygwin AT cygwin DOT com From: Hongyi Zhao Subject: Re: [OT] Re: Want to use tor with wget. Date: Wed, 14 Oct 2009 12:56:46 +0800 Lines: 101 Message-ID: References: <3j28d51fso528qi14rpfqcga8r9oqckji8 AT 4ax DOT com> <4AD413B2 DOT 1070903 AT gmail DOT com> <1631589547 DOT 20091013204226 AT gmail DOT com> <4AD4DE7C DOT 7030606 AT gmail DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT cygwin DOT com Mail-Followup-To: cygwin AT cygwin DOT com Delivered-To: mailing list cygwin AT cygwin DOT com On Tue, 13 Oct 2009 21:09:32 +0100, Dave Korn wrote: >[ We're offtopic here since it's not a cygwin-specific issue anymore, so I've >set a follow-up to the cygwin-talk list in case you have further questions or >replies. ] > >hongyi.zhao wrote: >> On Tuesday, October 13, 2009 at 13:44, dave.korn.cygwin wrote: >>> Hongyi Zhao wrote: > > >> I want to use wget to grab the following web page: >> >> http://www.cybersyndrome.net/pla5.html > > Then, you can tell wget to use your local privoxy as an http proxy, which is >exactly how your browser relates to it. > > export http_proxy=localhost:8118 > wget http://www.cybersyndrome.net/pla5.html > >should do the trick, but check the wget manual page about proxy support for >full details. (I'm assuming here you're running the usual kind of Tor setup >with a supporting co-installation of Privoxy.) > >> OTOH, I've also learned that curl support socks4/5 proxy, and I use >> the following command under my cygwin console: >> >> curl --socks5 127.0.0.1:9050 http://www.cybersyndrome.net/pla5.html >> >> But I meet the following error: >> >> ----------------------------- >> >> >> 302 Found >> >>

Found

>> The document has moved here.

>> >> ----------------------------- > > That's interesting. A real 302 redirect would have an actual 302 status >code and a Location header, not just be a 200 returning an html document with >the words "302 Found" and a URL in it. > >> Nevertheless, I can use firefox with Tor enabled to access this >> webpage. >> >> What's the reason > > It's something the server is doing deliberately, perhaps a malfunctioning or >misguided anti-bot feature of some sort, based on the request headers sent by >the user's agent. > >> and how can I grab this webpage just by a >> command-line downloading tool? > > Well, you can use wget! Or you can tell your curl to pretend it is wget! > >> $ curl 'http://www.cybersyndrome.net/pla5.html' >> >> >> 302 Found >> >>

Found

>> The document has moved here.

>> > >> $ wget 'http://www.cybersyndrome.net/pla5.html' >> --2009-10-13 21:00:36-- http://www.cybersyndrome.net/pla5.html >> Resolving www.cybersyndrome.net... 210.153.118.69 >> Connecting to www.cybersyndrome.net|210.153.118.69|:80... connected. >> HTTP request sent, awaiting response... 200 OK >> Length: unspecified [text/html] >> Saving to: `pla5.html' >> >> [ <=> ] 18,151 3.11K/s in 5.7s >> >> 2009-10-13 21:00:42 (3.11 KB/s) - `pla5.html' saved [18151] > >> $ curl 'http://www.cybersyndrome.net/pla5.html' -A 'User-Agent: Wget/1.11.4' >> >> >> >> >> >> CyberSyndrome : Proxy List / Anonymous >>