delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/2009/10/14/00:57:30

X-Recipient: archive-cygwin AT delorie DOT com
X-SWARE-Spam-Status: No, hits=-0.5 required=5.0 tests=BAYES_00,RCVD_NUMERIC_HELO,SPF_HELO_PASS,SPF_PASS
X-Spam-Check-By: sourceware.org
To: cygwin AT cygwin DOT com
From: Hongyi Zhao <hongyi DOT zhao AT gmail DOT com>
Subject: Re: [OT] Re: Want to use tor with wget.
Date: Wed, 14 Oct 2009 12:56:46 +0800
Lines: 101
Message-ID: <gfmad5tpm615sor2c436or5620hqn9dm69@4ax.com>
References: <3j28d51fso528qi14rpfqcga8r9oqckji8 AT 4ax DOT com> <4AD413B2 DOT 1070903 AT gmail DOT com> <1631589547 DOT 20091013204226 AT gmail DOT com> <4AD4DE7C DOT 7030606 AT gmail DOT com>
Mime-Version: 1.0
X-IsSubscribed: yes
Mailing-List: contact cygwin-help AT cygwin DOT com; run by ezmlm
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe AT cygwin DOT com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin AT cygwin DOT com>
List-Help: <mailto:cygwin-help AT cygwin DOT com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner AT cygwin DOT com
Mail-Followup-To: cygwin AT cygwin DOT com
Delivered-To: mailing list cygwin AT cygwin DOT com

On Tue, 13 Oct 2009 21:09:32 +0100, Dave Korn
<dave DOT korn DOT cygwin AT googlemail DOT com> wrote:

>[ We're offtopic here since it's not a cygwin-specific issue anymore, so I've
>set a follow-up to the cygwin-talk list in case you have further questions or
>replies. ]
>
>hongyi.zhao wrote:
>> On Tuesday, October 13, 2009 at 13:44, dave.korn.cygwin wrote:
>>> Hongyi Zhao wrote:
>
>
>> I want to use wget to grab the following web page:
>> 
>> http://www.cybersyndrome.net/pla5.html
>
>  Then, you can tell wget to use your local privoxy as an http proxy, which is
>exactly how your browser relates to it.
>
>  export http_proxy=localhost:8118
>  wget http://www.cybersyndrome.net/pla5.html
>
>should do the trick, but check the wget manual page about proxy support for
>full details.  (I'm assuming here you're running the usual kind of Tor setup
>with a supporting co-installation of Privoxy.)
>
>> OTOH,  I've  also  learned that curl support socks4/5 proxy, and I use
>> the following command under my cygwin console:
>> 
>> curl --socks5 127.0.0.1:9050 http://www.cybersyndrome.net/pla5.html
>> 
>> But I meet the following error:
>> 
>> -----------------------------
>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>> <HTML><HEAD>
>> <TITLE>302 Found</TITLE>
>> </HEAD><BODY>
>> <H1>Found</H1>
>> The document has moved <A HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40
>> 4.html">here</A>.<P>
>> </BODY></HTML>
>> -----------------------------
>
>  That's interesting.  A real 302 redirect would have an actual 302 status
>code and a Location header, not just be a 200 returning an html document with
>the words "302 Found" and a URL in it.
>
>> Nevertheless,  I  can  use  firefox  with  Tor  enabled to access this
>> webpage.
>> 
>> What's  the  reason  
>
>  It's something the server is doing deliberately, perhaps a malfunctioning or
>misguided anti-bot feature of some sort, based on the request headers sent by
>the user's agent.
>
>> and  how  can  I  grab  this  webpage  just  by a
>> command-line downloading tool?
>
>  Well, you can use wget!  Or you can tell your curl to pretend it is wget!
>
>> $ curl 'http://www.cybersyndrome.net/pla5.html'
>> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
>> <HTML><HEAD>
>> <TITLE>302 Found</TITLE>
>> </HEAD><BODY>
>> <H1>Found</H1>
>> The document has moved <A HREF="http://www8.big.or.jp/~000/CyberSyndrome/error40
>> 4.html">here</A>.<P>
>> </BODY></HTML>
>
>> $ wget 'http://www.cybersyndrome.net/pla5.html'
>> --2009-10-13 21:00:36--  http://www.cybersyndrome.net/pla5.html
>> Resolving www.cybersyndrome.net... 210.153.118.69
>> Connecting to www.cybersyndrome.net|210.153.118.69|:80... connected.
>> HTTP request sent, awaiting response... 200 OK
>> Length: unspecified [text/html]
>> Saving to: `pla5.html'
>> 
>>     [          <=>                          ] 18,151      3.11K/s   in 5.7s
>> 
>> 2009-10-13 21:00:42 (3.11 KB/s) - `pla5.html' saved [18151]
>
>> $ curl 'http://www.cybersyndrome.net/pla5.html' -A 'User-Agent: Wget/1.11.4'
>> <html>
>> <head>
>> <meta http-equiv="content-type" content="text/html; charset=Shift_JIS">
>> <meta name="robots" content="noarchive">
>> <meta name="description" content="???p??\??OV??Proxy?????????J???A?????B">
>> <title>CyberSyndrome : Proxy List / Anonymous</title>
>> <style type="text/css">
>           [ ... snip ... ]
>
>    cheers,
>      DaveK
>

Good, thanks a lot, I've got it.
-- 
.: Hongyi Zhao [ hongyi.zhao AT gmail.com ] Free as in Freedom :.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019