Download web pages recursively under an URL1
wget \
--recursive \
--no-clobber \
--page-requisites \
--adjust-extension \
--convert-links \
--restrict-file-names=windows \
--domains example.com \
-nH --cut-dirs=some_subdir \
-e robots=off \
--random-wait \
--wait 5 \
--no-parent \
www.example.com/subdirectory/
- Substitute
example.com
and
www.example.com/subdirectory/
with relevant
expressions in your problem.
-r
--recursive
: download the entire
Web site.
-D
--domains website.org
: don't
follow links outside website.org.
-np
--no-parent
: don't follow links
outside the directory subdirectory
.
-p
--page-requisites
: get all the
elements that compose the page (images, CSS and so on).
-E
`--adjust-xtension.
-k
--convert-links
: convert links so
that they work locally, off-line.
--restrict-file-names=windows
: modify filenames so
that they will work in Windows as well.
-nc
--no-clobber
: don't overwrite any
existing files (used in case the download is interrupted and
resumed).
-e robots=off
: force crawling regardless of
robots.txt setting.
-nH --cut-dirs=some_subdir
: cuts out hostname and
subdirectory name.
--random-wait
: randomizes the time between
requests to vary between 0.5 and 1.5 times of the waiting time
specified by the --wait
option.
-w
--wait=5
: number of seconds to
wait between requests. (See --random-wait
.)
References