Ubuntu – Why would I use Wget instead of a browser?


In what case should I prefer to use Wget rather than a browser?

I heard that Richard Stallman uses it instead of a browser for some anonymity reasons. And what does the server see when you get its files using Wget?

Best Answer

Typically you would never use it "instead of a browser". Browsers render HTML, make links clickable (as opposed to having to copy the URL into another wget command manually), etc. There's literally no upside to using wget as a human. If you are concerned about privacy, there's a million ways to clean a browser up (or you could use a less featureful browser, like Lynx if you really wanna get barebones without destroying all semblance of human user interface).

Wget is primarily used when you want a quick, cheap, scriptable/command-line way of downloading files. So, for example, you can put wget in a script to download a web page that gets updated with new data frequently, which is something a browser can't really be used for. You can use wget's various options to crawl and automatically save a website, which most browsers can't do, at least not without extensions.

In short, browsers are applications for humans looking at the internet, wget is a tool for machines and power users moving data over HTTP. Very similar in what they do (pull files from websites) but entirely different in their use.

Regarding what servers "see" when you get things with wget: all HTTP clients (browsers, wget, curl, other similar applications) transmit what's called a "User Agent", which is just a string that describes the browser (or these days, describes what browser features it has). This can be used to show different content depending on the user's browser (i.e. Google tries not to advertise Chrome to people already using Chrome). Some fools try to block power user shenanigans by blocking wget's user agent string, but you can just fake a Chrome user agent string to get around that. More often it's simply used for statistics so you know how popular different browsers are so you know which ones to test with the most thoroughly.

If you use wget's crawling functions, the server will see many rapid requests in a mostly alphabetical order. It's a dead giveaway that you're scraping their site. It looks entirely different from the browsing of a user. With a human user making requests in a browser, every page request is followed by all the images on that page, and then there's some delay, and then there's a request for another random page (or possibly a string of pages with a clear purpose).