Introduction to wget
wget (World Wide Web Get) is a powerful command-line utility for downloading files from the internet. It supports HTTP, HTTPS, FTP protocols, and can handle recursive downloads, resuming interrupted transfers, and many other features. Understanding wget is essential for automated downloads, web scraping, and system administration.
Basic Syntax
wget [options] [URL]
Key Features
- Non-interactive: Works in background, perfect for scripts
- Resume capability: Can resume interrupted downloads
- Recursive downloads: Download entire websites
- Bandwidth control: Limit download speed
- Authentication: Support for HTTP/FTP authentication
- Proxy support: Works with HTTP proxies
1. Basic Usage
Simple File Download
# Download a single file wget https://example.com/file.zip # Download with different filename wget -O newname.zip https://example.com/file.zip # Download to specific directory wget -P /path/to/directory/ https://example.com/file.zip # Download multiple files wget https://example.com/file1.zip https://example.com/file2.zip
Examples
$ wget https://releases.ubuntu.com/22.04/ubuntu-22.04.3-desktop-amd64.iso --2024-03-11 10:30:15-- https://releases.ubuntu.com/22.04/ubuntu-22.04.3-desktop-amd64.iso Resolving releases.ubuntu.com... 91.189.91.38 Connecting to releases.ubuntu.com|91.189.91.38|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 4823359488 (4.5G) [application/x-iso9660-image] Saving to: ‘ubuntu-22.04.3-desktop-amd64.iso’ ubuntu-22.04.3-desktop-amd64.iso 0%[ ] 10.23M 1.2MB/s
2. Common Options
Output Control
# -O: Save with different filename wget -O custom_name.html https://example.com # -P: Save to directory wget -P /tmp/downloads/ https://example.com/file.pdf # -nd: No directory structure wget -nd -P /downloads/ https://example.com/images/photo.jpg # -x: Force directory structure wget -x https://example.com/images/photo.jpg # Creates: example.com/images/photo.jpg
Quiet and Verbose Modes
# -q: Quiet mode (no output) wget -q https://example.com/file.zip # -v: Verbose mode (default) wget -v https://example.com/file.zip # --no-verbose: Less verbose wget --no-verbose https://example.com/file.zip # Progress indicators wget --progress=bar https://example.com/largefile.zip wget --progress=dot https://example.com/largefile.zip
Download Resume
# -c: Continue/resume partial download wget -c https://example.com/largefile.zip # --start-pos: Start from specific position wget --start-pos=1048576 https://example.com/file.zip # --timeout: Set network timeout wget --timeout=10 https://example.com/file.zip
3. Advanced Download Options
Bandwidth Limiting
# --limit-rate: Limit download speed wget --limit-rate=200k https://example.com/largefile.zip # Limit in different units wget --limit-rate=1m https://example.com/largefile.zip # 1 MB/s wget --limit-rate=500k https://example.com/largefile.zip # 500 KB/s
Retry Options
# -t: Number of retries wget -t 5 https://example.com/file.zip # --retry-connrefused: Retry on connection refused wget --retry-connrefused -t 10 https://example.com/file.zip # --wait: Wait between retries wget --wait=5 -t 3 https://example.com/file.zip # --waitretry: Wait longer between retries wget --waitretry=10 -t 5 https://example.com/file.zip # --random-wait: Randomize wait time wget --random-wait --wait=5 https://example.com/
Timeout Control
# --dns-timeout: DNS lookup timeout wget --dns-timeout=10 https://example.com # --connect-timeout: Connection timeout wget --connect-timeout=15 https://example.com # --read-timeout: Read timeout wget --read-timeout=20 https://example.com # --timeout: All timeouts wget --timeout=30 https://example.com
4. Authentication and Headers
HTTP Authentication
# --user and --password: HTTP authentication wget --user=username --password=password https://example.com/private/file.zip # Ask for password (more secure) wget --user=username --ask-password https://example.com/private/file.zip # Using .netrc file echo "machine example.com login username password secret" > ~/.netrc chmod 600 ~/.netrc wget https://example.com/private/file.zip
Custom Headers
# --header: Add custom HTTP headers wget --header="User-Agent: Mozilla/5.0" https://example.com wget --header="Accept: application/json" https://api.example.com/data wget --header="Referer: https://google.com" https://example.com # Multiple headers wget --header="User-Agent: Mozilla/5.0" \ --header="Accept-Language: en-US,en;q=0.9" \ --header="Cookie: session=12345" \ https://example.com # --referer: Set referer wget --referer="https://google.com" https://example.com # --user-agent: Set user agent wget --user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" \ https://example.com
Cookies
# --save-cookies: Save cookies to file wget --save-cookies cookies.txt --keep-session-cookies \ https://example.com/login # --load-cookies: Load cookies from file wget --load-cookies cookies.txt https://example.com/protected/page # --no-cookies: Disable cookies wget --no-cookies https://example.com
5. FTP Downloads
FTP Operations
# Download FTP file wget ftp://username:[email protected]/file.zip # Anonymous FTP wget ftp://ftp.example.com/pub/file.zip # FTP directory listing wget ftp://ftp.example.com/pub/ # Recursive FTP download wget -r ftp://ftp.example.com/pub/ # FTP with specific port wget ftp://example.com:2121/file.zip
FTP Options
# --ftp-user and --ftp-password wget --ftp-user=username --ftp-password=password ftp://example.com/file.zip # --no-passive-ftp: Disable passive mode wget --no-passive-ftp ftp://example.com/file.zip # --retr-symlinks: Retrieve symlinks as files wget --retr-symlinks -r ftp://example.com/pub/
6. Recursive Downloads
Basic Recursion
# -r: Recursive download wget -r https://example.com/docs/ # -l: Recursion depth wget -r -l 2 https://example.com/docs/ # 2 levels deep # --no-parent: Don't ascend to parent directory wget -r --no-parent https://example.com/docs/ # --accept: Accept specific file types wget -r --accept pdf,doc,txt https://example.com/docs/
Advanced Recursion
# -np: No parent (don't go to parent directory) wget -r -np https://example.com/docs/ # --reject: Reject specific file types wget -r --reject jpg,gif,mp4 https://example.com/gallery/ # --accept-list: File with accepted patterns wget -r --accept-list=patterns.txt https://example.com/ # --exclude-directories: Skip directories wget -r --exclude-directories=/images,/tmp https://example.com/ # --include-directories: Only these directories wget -r --include-directories=/docs,/downloads https://example.com/
Mirror a Website
# -m: Mirror website (equivalent to -r -N -l inf --no-remove-listing) wget -m https://example.com/ # Mirror with conversion for offline viewing wget -m -k -K -E https://example.com/ # -k: Convert links for local viewing # -K: Keep original file as .orig # -E: Adjust extensions # Mirror with timestamping wget -m -N https://example.com/ # -N: Only download newer files # Complete mirror command wget --mirror --convert-links --adjust-extension \ --page-requisites --no-parent https://example.com/
7. Page Requisites and Conversion
Download Page Requisites
# -p: Download all page requisites (images, CSS, JS) wget -p https://example.com/page.html # --page-requisites: Same as -p wget --page-requisites https://example.com/page.html # Download single page with all assets wget -p -k https://example.com/page.html # -k: Convert links for local viewing
Link Conversion
# -k: Convert links for local viewing wget -k https://example.com/page.html # -K: Keep original files with .orig extension wget -K https://example.com/page.html # --convert-links: Convert links after download wget --convert-links https://example.com/page.html # --adjust-extension: Add appropriate extensions wget --adjust-extension https://example.com/page.php
8. Timestamping and File Management
Timestamping
# -N: Only download newer files (timestamping) wget -N https://example.com/file.zip # --timestamping: Same as -N wget --timestamping https://example.com/file.zip # --no-use-server-timestamps: Don't set local timestamps wget --no-use-server-timestamps https://example.com/file.zip
File Versioning
# --backups: Number of backups to keep wget --backups=5 -N https://example.com/file.zip # --backup-converted: Backup converted files wget -k --backup-converted https://example.com/page.html # --keep-session-cookies: Keep session cookies wget --keep-session-cookies --save-cookies cookies.txt \ https://example.com/login
9. Input from Files
Download from File List
# -i: Read URLs from file wget -i urls.txt # Download from file with options wget -i urls.txt -P downloads/ # URLs file format (one per line) echo "https://example.com/file1.zip" > urls.txt echo "https://example.com/file2.zip" >> urls.txt echo "https://example.com/file3.zip" >> urls.txt
Advanced Input Handling
# Download from stdin
cat urls.txt | wget -i -
# Combine with other commands
find . -name "*.url" -exec cat {} \; | wget -i -
# Process URLs from command output
curl -s https://api.example.com/files | jq -r '.[].url' | wget -i -
10. Spider and Testing
Web Spider Mode
# --spider: Check URLs without downloading wget --spider https://example.com # Check multiple URLs wget --spider -i urls.txt # Check broken links wget --spider --force-html -r -l1 https://example.com/ 2>&1 | \ grep -B2 '404' # Verbose spider output wget --spider -v https://example.com
Testing and Debugging
# --debug: Debug output wget --debug https://example.com # --server-response: Print server response wget --server-response https://example.com # --save-headers: Save headers to file wget --save-headers https://example.com # -S: Show server response wget -S https://example.com
11. Proxy Configuration
HTTP Proxy
# --proxy: Use HTTP proxy wget --proxy=on --proxy-user=user --proxy-password=pass \ -e use_proxy=yes -e http_proxy=proxy.example.com:8080 \ https://example.com # Environment variables export http_proxy=http://proxy.example.com:8080 export https_proxy=http://proxy.example.com:8080 wget https://example.com # --no-proxy: Bypass proxy wget --no-proxy https://example.com
FTP Proxy
# FTP proxy export ftp_proxy=ftp://proxy.example.com:2121 wget ftp://ftp.example.com/file.zip
12. Script Examples
Download Manager Script
#!/bin/bash
# download_manager.sh - Advanced download manager with wget
DOWNLOAD_DIR="$HOME/downloads"
LOG_FILE="$DOWNLOAD_DIR/download.log"
URLS_FILE="$DOWNLOAD_DIR/urls.txt"
MAX_CONCURRENT=3
mkdir -p "$DOWNLOAD_DIR"
download_file() {
local url="$1"
local filename=$(basename "$url")
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting download: $filename" >> "$LOG_FILE"
wget -c \
--timeout=30 \
--tries=5 \
--limit-rate=500k \
--user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" \
-P "$DOWNLOAD_DIR" \
"$url" 2>&1 | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $line" >> "$LOG_FILE"
done
if [ $? -eq 0 ]; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Completed: $filename" >> "$LOG_FILE"
else
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Failed: $filename" >> "$LOG_FILE"
echo "$url" >> "$DOWNLOAD_DIR/failed.txt"
fi
}
# Download multiple files with concurrency control
download_concurrent() {
local count=0
while read url; do
if [[ -n "$url" && ! "$url" =~ ^# ]]; then
download_file "$url" &
((count++))
if ((count >= MAX_CONCURRENT)); then
wait
count=0
fi
fi
done < "$URLS_FILE"
wait
}
# Main execution
case "$1" in
single)
download_file "$2"
;;
batch)
download_concurrent
;;
retry)
if [[ -f "$DOWNLOAD_DIR/failed.txt" ]]; then
mv "$DOWNLOAD_DIR/failed.txt" "$URLS_FILE"
download_concurrent
fi
;;
*)
echo "Usage: $0 {single <url>|batch|retry}"
exit 1
;;
esac
Website Backup Script
#!/bin/bash
# backup_website.sh - Backup entire website with wget
BACKUP_DIR="$HOME/website_backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
SITE_URL="$1"
if [[ -z "$SITE_URL" ]]; then
echo "Usage: $0 <website_url>"
exit 1
fi
DOMAIN=$(echo "$SITE_URL" | awk -F/ '{print $3}')
BACKUP_PATH="$BACKUP_DIR/${DOMAIN}_$TIMESTAMP"
mkdir -p "$BACKUP_PATH"
echo "Starting backup of $SITE_URL to $BACKUP_PATH"
wget \
--mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--no-parent \
--wait=2 \
--limit-rate=500k \
--random-wait \
--user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" \
--directory-prefix="$BACKUP_PATH" \
--output-file="$BACKUP_PATH/backup.log" \
"$SITE_URL"
if [ $? -eq 0 ]; then
echo "Backup completed successfully"
# Create archive
tar -czf "$BACKUP_PATH.tar.gz" -C "$BACKUP_DIR" "${DOMAIN}_$TIMESTAMP"
rm -rf "$BACKUP_PATH"
echo "Archive created: $BACKUP_PATH.tar.gz"
# Generate report
cat > "$BACKUP_PATH.report" << EOF
Website Backup Report
====================
URL: $SITE_URL
Date: $(date)
Backup File: $BACKUP_PATH.tar.gz
Size: $(du -h "$BACKUP_PATH.tar.gz" | cut -f1)
Download Log Summary:
$(grep "saved\|removed" "$BACKUP_PATH/backup.log" | tail -20)
EOF
echo "Report saved: $BACKUP_PATH.report"
else
echo "Backup failed"
exit 1
fi
Batch Download Script
#!/bin/bash
# batch_download.sh - Batch download with pattern matching
PATTERN="$1"
OUTPUT_DIR="${2:-./downloads}"
if [[ -z "$PATTERN" ]]; then
echo "Usage: $0 <url_pattern> [output_dir]"
echo "Example: $0 'https://example.com/images/img{001..100}.jpg'"
exit 1
fi
mkdir -p "$OUTPUT_DIR"
# Expand pattern using brace expansion
URLS=$(eval echo "$PATTERN")
download_with_progress() {
local total=$(echo "$URLS" | wc -w)
local current=0
for url in $URLS; do
((current++))
filename=$(basename "$url")
echo "[$current/$total] Downloading: $filename"
wget -c \
--quiet \
--show-progress \
--timeout=30 \
--tries=3 \
-P "$OUTPUT_DIR" \
"$url"
if [ $? -eq 0 ]; then
echo " ✓ Completed"
else
echo " ✗ Failed"
echo "$url" >> "$OUTPUT_DIR/failed.txt"
fi
done
}
# Start download
echo "Starting batch download of $(echo "$URLS" | wc -w) files"
download_with_progress
# Summary
echo
echo "Download Summary:"
echo " Location: $OUTPUT_DIR"
ls -lh "$OUTPUT_DIR" | tail -n +2
if [[ -f "$OUTPUT_DIR/failed.txt" ]]; then
echo " Failed downloads: $(wc -l < "$OUTPUT_DIR/failed.txt")"
fi
13. Rate Limiting and Politeness
Rate Control
# --wait: Wait between downloads wget --wait=5 -r https://example.com/ # --random-wait: Randomize wait time wget --random-wait --wait=5 -r https://example.com/ # --limit-rate: Limit bandwidth wget --limit-rate=200k -r https://example.com/ # --quota: Limit total download size wget --quota=100m -r https://example.com/
Robots.txt Handling
# Respect robots.txt (default) wget -e robots=off https://example.com/ # Ignore robots.txt wget -e robots=off https://example.com/ # Custom user agent for robots.txt wget --user-agent="MyBot/1.0" https://example.com/
14. SSL/TLS Options
Certificate Handling
# --no-check-certificate: Skip certificate validation wget --no-check-certificate https://example.com # --certificate: Client certificate wget --certificate=client.crt --private-key=client.key https://example.com # --ca-certificate: CA certificate wget --ca-certificate=ca.crt https://example.com # --secure-protocol: Specify SSL/TLS protocol wget --secure-protocol=TLSv1_2 https://example.com
15. Output Formatting
Custom Log Format
# --output-file: Log to file wget --output-file=download.log https://example.com/file.zip # --append-output: Append to log file wget --append-output=download.log https://example.com/file2.zip # --output-document: Output to file wget --output-document=output.html https://example.com # --progress: Progress indicator format wget --progress=bar:force https://example.com/largefile.zip wget --progress=dot:giga https://example.com/largefile.zip
16. Integration with Other Commands
Piping and Redirection
# Pipe output to other commands wget -qO- https://example.com | grep "title" # Download and process wget -qO- https://example.com/data.json | jq '.users[].name' # Download and extract wget -qO- https://example.com/archive.tar.gz | tar xz # Download and checksum wget -qO- https://example.com/file.zip | sha256sum
With curl Comparison
# wget is better for recursive downloads
wget -r https://example.com/
# curl is better for API interactions
curl -X POST -d '{"key":"value"}' https://api.example.com
# Both can download files
wget https://example.com/file.zip
curl -O https://example.com/file.zip
17. Error Handling
Exit Codes
# 0: Success # 1: Generic error # 2: Parse error # 3: File I/O error # 4: Network failure # 5: SSL verification failure # 6: Authentication failure # 7: Protocol errors # 8: Server error wget https://example.com/file.zip case $? in 0) echo "Download successful" ;; 4) echo "Network error" ;; 6) echo "Authentication failed" ;; 8) echo "Server error" ;; *) echo "Other error: $?" ;; esac
Error Recovery
# Retry on error while ! wget -c https://example.com/file.zip; do echo "Download failed, retrying in 10 seconds..." sleep 10 done # Continue from where it left off wget -c https://example.com/largefile.zip # Log errors wget https://example.com/file.zip 2>> error.log
18. Quick Reference Card
Most Common Options
| Option | Description |
|---|---|
-O file | Save as different filename |
-P dir | Save to directory |
-c | Continue partial download |
-r | Recursive download |
-l depth | Recursion depth |
-np | No parent directories |
-nd | No directory structure |
-x | Force directory structure |
-N | Timestamping |
-m | Mirror website |
-p | Download page requisites |
-k | Convert links |
-E | Adjust extensions |
-t num | Number of retries |
--limit-rate | Limit download speed |
--wait | Wait between downloads |
--random-wait | Randomize wait time |
--user | HTTP username |
--password | HTTP password |
--header | Custom HTTP header |
--referer | Set referer |
--user-agent | Set user agent |
--no-check-certificate | Skip SSL validation |
-q | Quiet mode |
-v | Verbose mode |
--spider | Check URLs only |
-i file | Read URLs from file |
Common Combinations
| Command | Purpose |
|---|---|
wget -c URL | Resume download |
wget -r -np URL | Download directory recursively |
wget -m URL | Mirror website |
wget -p -k URL | Download page with assets |
wget -i urls.txt | Download multiple URLs |
wget --limit-rate=200k URL | Limit speed |
wget -t 5 --wait=10 URL | Retry with wait |
wget --user=user --ask-password URL | Authenticated download |
wget -qO- URL | command | Pipe output |
wget --spider URL | Check if URL exists |
Conclusion
wget is an indispensable tool for automated downloading and web content retrieval:
Key Points Summary
- Non-interactive: Perfect for scripts and automation
- Resume capability: Continue interrupted downloads with
-c - Recursive downloads: Mirror entire websites with
-rand-m - Bandwidth control: Limit speed with
--limit-rate - Authentication: Support for HTTP/FTP auth
- Proxy support: Works with HTTP proxies
- Robust error handling: Retry mechanisms and timeouts
Best Practices
- Use
-cfor large downloads - Resume if interrupted - Implement rate limiting - Be respectful to servers
- Use
--waitand--random-wait- Avoid overwhelming servers - Set appropriate timeouts - Prevent hanging downloads
- Log downloads - Track what was downloaded
- Verify SSL certificates - Ensure secure downloads
- Use
-ifor batch downloads - Maintain URL lists - Test with
--spiderfirst - Verify URLs before downloading
Quick Reference
| Want to… | Command |
|---|---|
| Download a file | wget URL |
| Resume download | wget -c URL |
| Limit speed | wget --limit-rate=200k URL |
| Download recursively | wget -r URL |
| Mirror website | wget -m URL |
| Download with authentication | wget --user=name --ask-password URL |
| Download multiple | wget -i urls.txt |
| Check URL | wget --spider URL |
| Pipe to command | wget -qO- URL | command |
| Ignore SSL | wget --no-check-certificate URL |
wget's versatility and robustness make it the go-to tool for automated downloads, website mirroring, and content retrieval in scripts and command-line operations.