Ubuntu – Wget batch of files fails, curl works, what am I doing wrong

command linewget

I am trying to download the entire directory from this website: https://data.geobasis-bb.de/geobasis/daten/dgm/xyz/

What I tried is:

wget --show-progress -A 'dgm_*.zip' https://data.geobasis-bb.de/geobasis/daten/dgm/xyz/ -P /run/media/usr1/exthdd/dgm

What it should do, as far as I understand it, is download all files that fit the name schmea dgm_.zip*. However it returns only:

--2020-01-13 14:50:11--  https://data.geobasis-bb.de/geobasis/daten/dgm/xyz/
CA-Zertifikat »/etc/ssl/certs/ca-certificates.crt« wurde geladen
Auflösen des Hostnamens data.geobasis-bb.de (data.geobasis-bb.de)… 194.99.76.18, 194.76.232.112
Verbindungsaufbau zu data.geobasis-bb.de (data.geobasis-bb.de)|194.99.76.18|:443 … verbunden.
HTTP-Anforderung gesendet, auf Antwort wird gewartet … 200 OK
Länge: nicht spezifiziert [text/html]
Wird in »/run/media/lgoldmann/lg_backup_diss/dgm/index.html.tmp.2« gespeichert.

index.html.tmp.2                             [   <=>                                                                             ]   2,65M  4,69MB/s    in 0,6s    

2020-01-13 14:50:15 (4,69 MB/s) - »/run/media/lgoldmann/lg_backup_diss/dgm/index.html.tmp.2« gespeichert [2778920]

The website also offers a pretyped comand for curl, which works just fine, but I am trying to find out, what went wrong with my wget command.

Best Answer

  • You need to use the -r option to get all links on the page or, otherwise, wget will get only the first page that is served by the web server (i.e. default or index) and quit.

    It is wise when using -r to use -np to exclude parent links and make sure wget does not follow links that are one level or more up.

    Also you might not want wget to rebuild the directory structure of the site locally and just download the files, so also use the -nd option like so:

    wget --show-progress -A 'dgm_*.zip' -r -np -nd https://data.geobasis-bb.de/geobasis/daten/dgm/xyz/ -P /run/media/usr1/exthdd/dgm