Problems with Wget to a CloudFlare hosted site: 503 Service Unavailable


I have seen other instances of 503 errors using Wget, but to no available I cannot solve this.

When I try to download a certain website, I get a 503 Service Unavailable error. This does not happen to any website except for the one in question.

This is what is happening. I enter:

wget -r --no-parent -U Mozilla

And this is the error I get back.:

--2015-03-12 11:57:08--
Connecting to||:80... connected.
HTTP request sent, awaiting response... 503 Service Unavailable
2015-03-12 11:57:09 ERROR 503: Service Unavailable.

This site does use CloudFlare protection (when opening the site you have to wait 5 seconds while it “checks your browser.”

Best Answer

CloudFlare protection is based on JavaScript, cookies and http header filtering. If you want to crawl CloudFlare protected site using wget, you first have to enter it in a browser with debugger (eg. Firefox with Firebug), and copy Cookie request header.

Now the hardest part: this cookie is valid for 1 hour only, so you will have to refresh it manually each hour.

Here is the complete command you can use to crawl the site:

wget -U "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0" --header="Accept: text/html" --header="Cookie: __cfduid=xpzezr54v5qnaoet5v2dx1ias5xx8m4faj7d5mfg4og; cf_clearance=0n01f6dkcd31en6v4b234a6d1jhoaqgxa7lklwbj-1438079290-3600" -np -r

Note that __cfduid cookie value is constant, and you only have to change cf_clearance cookie value each hour.

Related Question