Bash wget Command – Complete Guide to Non-Interactive Network Downloader

Introduction to wget

wget (World Wide Web Get) is a powerful command-line utility for downloading files from the internet. It supports HTTP, HTTPS, FTP protocols, and can handle recursive downloads, resuming interrupted transfers, and many other features. Understanding wget is essential for automated downloads, web scraping, and system administration.

Basic Syntax

wget [options] [URL]

Key Features

  • Non-interactive: Works in background, perfect for scripts
  • Resume capability: Can resume interrupted downloads
  • Recursive downloads: Download entire websites
  • Bandwidth control: Limit download speed
  • Authentication: Support for HTTP/FTP authentication
  • Proxy support: Works with HTTP proxies

1. Basic Usage

Simple File Download

# Download a single file
wget https://example.com/file.zip
# Download with different filename
wget -O newname.zip https://example.com/file.zip
# Download to specific directory
wget -P /path/to/directory/ https://example.com/file.zip
# Download multiple files
wget https://example.com/file1.zip https://example.com/file2.zip

Examples

$ wget https://releases.ubuntu.com/22.04/ubuntu-22.04.3-desktop-amd64.iso
--2024-03-11 10:30:15--  https://releases.ubuntu.com/22.04/ubuntu-22.04.3-desktop-amd64.iso
Resolving releases.ubuntu.com... 91.189.91.38
Connecting to releases.ubuntu.com|91.189.91.38|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4823359488 (4.5G) [application/x-iso9660-image]
Saving to: ‘ubuntu-22.04.3-desktop-amd64.iso’
ubuntu-22.04.3-desktop-amd64.iso   0%[                    ]  10.23M  1.2MB/s

2. Common Options

Output Control

# -O: Save with different filename
wget -O custom_name.html https://example.com
# -P: Save to directory
wget -P /tmp/downloads/ https://example.com/file.pdf
# -nd: No directory structure
wget -nd -P /downloads/ https://example.com/images/photo.jpg
# -x: Force directory structure
wget -x https://example.com/images/photo.jpg
# Creates: example.com/images/photo.jpg

Quiet and Verbose Modes

# -q: Quiet mode (no output)
wget -q https://example.com/file.zip
# -v: Verbose mode (default)
wget -v https://example.com/file.zip
# --no-verbose: Less verbose
wget --no-verbose https://example.com/file.zip
# Progress indicators
wget --progress=bar https://example.com/largefile.zip
wget --progress=dot https://example.com/largefile.zip

Download Resume

# -c: Continue/resume partial download
wget -c https://example.com/largefile.zip
# --start-pos: Start from specific position
wget --start-pos=1048576 https://example.com/file.zip
# --timeout: Set network timeout
wget --timeout=10 https://example.com/file.zip

3. Advanced Download Options

Bandwidth Limiting

# --limit-rate: Limit download speed
wget --limit-rate=200k https://example.com/largefile.zip
# Limit in different units
wget --limit-rate=1m https://example.com/largefile.zip  # 1 MB/s
wget --limit-rate=500k https://example.com/largefile.zip # 500 KB/s

Retry Options

# -t: Number of retries
wget -t 5 https://example.com/file.zip
# --retry-connrefused: Retry on connection refused
wget --retry-connrefused -t 10 https://example.com/file.zip
# --wait: Wait between retries
wget --wait=5 -t 3 https://example.com/file.zip
# --waitretry: Wait longer between retries
wget --waitretry=10 -t 5 https://example.com/file.zip
# --random-wait: Randomize wait time
wget --random-wait --wait=5 https://example.com/

Timeout Control

# --dns-timeout: DNS lookup timeout
wget --dns-timeout=10 https://example.com
# --connect-timeout: Connection timeout
wget --connect-timeout=15 https://example.com
# --read-timeout: Read timeout
wget --read-timeout=20 https://example.com
# --timeout: All timeouts
wget --timeout=30 https://example.com

4. Authentication and Headers

HTTP Authentication

# --user and --password: HTTP authentication
wget --user=username --password=password https://example.com/private/file.zip
# Ask for password (more secure)
wget --user=username --ask-password https://example.com/private/file.zip
# Using .netrc file
echo "machine example.com login username password secret" > ~/.netrc
chmod 600 ~/.netrc
wget https://example.com/private/file.zip

Custom Headers

# --header: Add custom HTTP headers
wget --header="User-Agent: Mozilla/5.0" https://example.com
wget --header="Accept: application/json" https://api.example.com/data
wget --header="Referer: https://google.com" https://example.com
# Multiple headers
wget --header="User-Agent: Mozilla/5.0" \
--header="Accept-Language: en-US,en;q=0.9" \
--header="Cookie: session=12345" \
https://example.com
# --referer: Set referer
wget --referer="https://google.com" https://example.com
# --user-agent: Set user agent
wget --user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" \
https://example.com

Cookies

# --save-cookies: Save cookies to file
wget --save-cookies cookies.txt --keep-session-cookies \
https://example.com/login
# --load-cookies: Load cookies from file
wget --load-cookies cookies.txt https://example.com/protected/page
# --no-cookies: Disable cookies
wget --no-cookies https://example.com

5. FTP Downloads

FTP Operations

# Download FTP file
wget ftp://username:[email protected]/file.zip
# Anonymous FTP
wget ftp://ftp.example.com/pub/file.zip
# FTP directory listing
wget ftp://ftp.example.com/pub/
# Recursive FTP download
wget -r ftp://ftp.example.com/pub/
# FTP with specific port
wget ftp://example.com:2121/file.zip

FTP Options

# --ftp-user and --ftp-password
wget --ftp-user=username --ftp-password=password ftp://example.com/file.zip
# --no-passive-ftp: Disable passive mode
wget --no-passive-ftp ftp://example.com/file.zip
# --retr-symlinks: Retrieve symlinks as files
wget --retr-symlinks -r ftp://example.com/pub/

6. Recursive Downloads

Basic Recursion

# -r: Recursive download
wget -r https://example.com/docs/
# -l: Recursion depth
wget -r -l 2 https://example.com/docs/  # 2 levels deep
# --no-parent: Don't ascend to parent directory
wget -r --no-parent https://example.com/docs/
# --accept: Accept specific file types
wget -r --accept pdf,doc,txt https://example.com/docs/

Advanced Recursion

# -np: No parent (don't go to parent directory)
wget -r -np https://example.com/docs/
# --reject: Reject specific file types
wget -r --reject jpg,gif,mp4 https://example.com/gallery/
# --accept-list: File with accepted patterns
wget -r --accept-list=patterns.txt https://example.com/
# --exclude-directories: Skip directories
wget -r --exclude-directories=/images,/tmp https://example.com/
# --include-directories: Only these directories
wget -r --include-directories=/docs,/downloads https://example.com/

Mirror a Website

# -m: Mirror website (equivalent to -r -N -l inf --no-remove-listing)
wget -m https://example.com/
# Mirror with conversion for offline viewing
wget -m -k -K -E https://example.com/
# -k: Convert links for local viewing
# -K: Keep original file as .orig
# -E: Adjust extensions
# Mirror with timestamping
wget -m -N https://example.com/
# -N: Only download newer files
# Complete mirror command
wget --mirror --convert-links --adjust-extension \
--page-requisites --no-parent https://example.com/

7. Page Requisites and Conversion

Download Page Requisites

# -p: Download all page requisites (images, CSS, JS)
wget -p https://example.com/page.html
# --page-requisites: Same as -p
wget --page-requisites https://example.com/page.html
# Download single page with all assets
wget -p -k https://example.com/page.html
# -k: Convert links for local viewing

Link Conversion

# -k: Convert links for local viewing
wget -k https://example.com/page.html
# -K: Keep original files with .orig extension
wget -K https://example.com/page.html
# --convert-links: Convert links after download
wget --convert-links https://example.com/page.html
# --adjust-extension: Add appropriate extensions
wget --adjust-extension https://example.com/page.php

8. Timestamping and File Management

Timestamping

# -N: Only download newer files (timestamping)
wget -N https://example.com/file.zip
# --timestamping: Same as -N
wget --timestamping https://example.com/file.zip
# --no-use-server-timestamps: Don't set local timestamps
wget --no-use-server-timestamps https://example.com/file.zip

File Versioning

# --backups: Number of backups to keep
wget --backups=5 -N https://example.com/file.zip
# --backup-converted: Backup converted files
wget -k --backup-converted https://example.com/page.html
# --keep-session-cookies: Keep session cookies
wget --keep-session-cookies --save-cookies cookies.txt \
https://example.com/login

9. Input from Files

Download from File List

# -i: Read URLs from file
wget -i urls.txt
# Download from file with options
wget -i urls.txt -P downloads/
# URLs file format (one per line)
echo "https://example.com/file1.zip" > urls.txt
echo "https://example.com/file2.zip" >> urls.txt
echo "https://example.com/file3.zip" >> urls.txt

Advanced Input Handling

# Download from stdin
cat urls.txt | wget -i -
# Combine with other commands
find . -name "*.url" -exec cat {} \; | wget -i -
# Process URLs from command output
curl -s https://api.example.com/files | jq -r '.[].url' | wget -i -

10. Spider and Testing

Web Spider Mode

# --spider: Check URLs without downloading
wget --spider https://example.com
# Check multiple URLs
wget --spider -i urls.txt
# Check broken links
wget --spider --force-html -r -l1 https://example.com/ 2>&1 | \
grep -B2 '404'
# Verbose spider output
wget --spider -v https://example.com

Testing and Debugging

# --debug: Debug output
wget --debug https://example.com
# --server-response: Print server response
wget --server-response https://example.com
# --save-headers: Save headers to file
wget --save-headers https://example.com
# -S: Show server response
wget -S https://example.com

11. Proxy Configuration

HTTP Proxy

# --proxy: Use HTTP proxy
wget --proxy=on --proxy-user=user --proxy-password=pass \
-e use_proxy=yes -e http_proxy=proxy.example.com:8080 \
https://example.com
# Environment variables
export http_proxy=http://proxy.example.com:8080
export https_proxy=http://proxy.example.com:8080
wget https://example.com
# --no-proxy: Bypass proxy
wget --no-proxy https://example.com

FTP Proxy

# FTP proxy
export ftp_proxy=ftp://proxy.example.com:2121
wget ftp://ftp.example.com/file.zip

12. Script Examples

Download Manager Script

#!/bin/bash
# download_manager.sh - Advanced download manager with wget
DOWNLOAD_DIR="$HOME/downloads"
LOG_FILE="$DOWNLOAD_DIR/download.log"
URLS_FILE="$DOWNLOAD_DIR/urls.txt"
MAX_CONCURRENT=3
mkdir -p "$DOWNLOAD_DIR"
download_file() {
local url="$1"
local filename=$(basename "$url")
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Starting download: $filename" >> "$LOG_FILE"
wget -c \
--timeout=30 \
--tries=5 \
--limit-rate=500k \
--user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" \
-P "$DOWNLOAD_DIR" \
"$url" 2>&1 | while read line; do
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $line" >> "$LOG_FILE"
done
if [ $? -eq 0 ]; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Completed: $filename" >> "$LOG_FILE"
else
echo "[$(date '+%Y-%m-%d %H:%M:%S')] Failed: $filename" >> "$LOG_FILE"
echo "$url" >> "$DOWNLOAD_DIR/failed.txt"
fi
}
# Download multiple files with concurrency control
download_concurrent() {
local count=0
while read url; do
if [[ -n "$url" && ! "$url" =~ ^# ]]; then
download_file "$url" &
((count++))
if ((count >= MAX_CONCURRENT)); then
wait
count=0
fi
fi
done < "$URLS_FILE"
wait
}
# Main execution
case "$1" in
single)
download_file "$2"
;;
batch)
download_concurrent
;;
retry)
if [[ -f "$DOWNLOAD_DIR/failed.txt" ]]; then
mv "$DOWNLOAD_DIR/failed.txt" "$URLS_FILE"
download_concurrent
fi
;;
*)
echo "Usage: $0 {single <url>|batch|retry}"
exit 1
;;
esac

Website Backup Script

#!/bin/bash
# backup_website.sh - Backup entire website with wget
BACKUP_DIR="$HOME/website_backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
SITE_URL="$1"
if [[ -z "$SITE_URL" ]]; then
echo "Usage: $0 <website_url>"
exit 1
fi
DOMAIN=$(echo "$SITE_URL" | awk -F/ '{print $3}')
BACKUP_PATH="$BACKUP_DIR/${DOMAIN}_$TIMESTAMP"
mkdir -p "$BACKUP_PATH"
echo "Starting backup of $SITE_URL to $BACKUP_PATH"
wget \
--mirror \
--convert-links \
--adjust-extension \
--page-requisites \
--no-parent \
--wait=2 \
--limit-rate=500k \
--random-wait \
--user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36" \
--directory-prefix="$BACKUP_PATH" \
--output-file="$BACKUP_PATH/backup.log" \
"$SITE_URL"
if [ $? -eq 0 ]; then
echo "Backup completed successfully"
# Create archive
tar -czf "$BACKUP_PATH.tar.gz" -C "$BACKUP_DIR" "${DOMAIN}_$TIMESTAMP"
rm -rf "$BACKUP_PATH"
echo "Archive created: $BACKUP_PATH.tar.gz"
# Generate report
cat > "$BACKUP_PATH.report" << EOF
Website Backup Report
====================
URL: $SITE_URL
Date: $(date)
Backup File: $BACKUP_PATH.tar.gz
Size: $(du -h "$BACKUP_PATH.tar.gz" | cut -f1)
Download Log Summary:
$(grep "saved\|removed" "$BACKUP_PATH/backup.log" | tail -20)
EOF
echo "Report saved: $BACKUP_PATH.report"
else
echo "Backup failed"
exit 1
fi

Batch Download Script

#!/bin/bash
# batch_download.sh - Batch download with pattern matching
PATTERN="$1"
OUTPUT_DIR="${2:-./downloads}"
if [[ -z "$PATTERN" ]]; then
echo "Usage: $0 <url_pattern> [output_dir]"
echo "Example: $0 'https://example.com/images/img{001..100}.jpg'"
exit 1
fi
mkdir -p "$OUTPUT_DIR"
# Expand pattern using brace expansion
URLS=$(eval echo "$PATTERN")
download_with_progress() {
local total=$(echo "$URLS" | wc -w)
local current=0
for url in $URLS; do
((current++))
filename=$(basename "$url")
echo "[$current/$total] Downloading: $filename"
wget -c \
--quiet \
--show-progress \
--timeout=30 \
--tries=3 \
-P "$OUTPUT_DIR" \
"$url"
if [ $? -eq 0 ]; then
echo "  ✓ Completed"
else
echo "  ✗ Failed"
echo "$url" >> "$OUTPUT_DIR/failed.txt"
fi
done
}
# Start download
echo "Starting batch download of $(echo "$URLS" | wc -w) files"
download_with_progress
# Summary
echo
echo "Download Summary:"
echo "  Location: $OUTPUT_DIR"
ls -lh "$OUTPUT_DIR" | tail -n +2
if [[ -f "$OUTPUT_DIR/failed.txt" ]]; then
echo "  Failed downloads: $(wc -l < "$OUTPUT_DIR/failed.txt")"
fi

13. Rate Limiting and Politeness

Rate Control

# --wait: Wait between downloads
wget --wait=5 -r https://example.com/
# --random-wait: Randomize wait time
wget --random-wait --wait=5 -r https://example.com/
# --limit-rate: Limit bandwidth
wget --limit-rate=200k -r https://example.com/
# --quota: Limit total download size
wget --quota=100m -r https://example.com/

Robots.txt Handling

# Respect robots.txt (default)
wget -e robots=off https://example.com/
# Ignore robots.txt
wget -e robots=off https://example.com/
# Custom user agent for robots.txt
wget --user-agent="MyBot/1.0" https://example.com/

14. SSL/TLS Options

Certificate Handling

# --no-check-certificate: Skip certificate validation
wget --no-check-certificate https://example.com
# --certificate: Client certificate
wget --certificate=client.crt --private-key=client.key https://example.com
# --ca-certificate: CA certificate
wget --ca-certificate=ca.crt https://example.com
# --secure-protocol: Specify SSL/TLS protocol
wget --secure-protocol=TLSv1_2 https://example.com

15. Output Formatting

Custom Log Format

# --output-file: Log to file
wget --output-file=download.log https://example.com/file.zip
# --append-output: Append to log file
wget --append-output=download.log https://example.com/file2.zip
# --output-document: Output to file
wget --output-document=output.html https://example.com
# --progress: Progress indicator format
wget --progress=bar:force https://example.com/largefile.zip
wget --progress=dot:giga https://example.com/largefile.zip

16. Integration with Other Commands

Piping and Redirection

# Pipe output to other commands
wget -qO- https://example.com | grep "title"
# Download and process
wget -qO- https://example.com/data.json | jq '.users[].name'
# Download and extract
wget -qO- https://example.com/archive.tar.gz | tar xz
# Download and checksum
wget -qO- https://example.com/file.zip | sha256sum

With curl Comparison

# wget is better for recursive downloads
wget -r https://example.com/
# curl is better for API interactions
curl -X POST -d '{"key":"value"}' https://api.example.com
# Both can download files
wget https://example.com/file.zip
curl -O https://example.com/file.zip

17. Error Handling

Exit Codes

# 0: Success
# 1: Generic error
# 2: Parse error
# 3: File I/O error
# 4: Network failure
# 5: SSL verification failure
# 6: Authentication failure
# 7: Protocol errors
# 8: Server error
wget https://example.com/file.zip
case $? in
0) echo "Download successful" ;;
4) echo "Network error" ;;
6) echo "Authentication failed" ;;
8) echo "Server error" ;;
*) echo "Other error: $?" ;;
esac

Error Recovery

# Retry on error
while ! wget -c https://example.com/file.zip; do
echo "Download failed, retrying in 10 seconds..."
sleep 10
done
# Continue from where it left off
wget -c https://example.com/largefile.zip
# Log errors
wget https://example.com/file.zip 2>> error.log

18. Quick Reference Card

Most Common Options

OptionDescription
-O fileSave as different filename
-P dirSave to directory
-cContinue partial download
-rRecursive download
-l depthRecursion depth
-npNo parent directories
-ndNo directory structure
-xForce directory structure
-NTimestamping
-mMirror website
-pDownload page requisites
-kConvert links
-EAdjust extensions
-t numNumber of retries
--limit-rateLimit download speed
--waitWait between downloads
--random-waitRandomize wait time
--userHTTP username
--passwordHTTP password
--headerCustom HTTP header
--refererSet referer
--user-agentSet user agent
--no-check-certificateSkip SSL validation
-qQuiet mode
-vVerbose mode
--spiderCheck URLs only
-i fileRead URLs from file

Common Combinations

CommandPurpose
wget -c URLResume download
wget -r -np URLDownload directory recursively
wget -m URLMirror website
wget -p -k URLDownload page with assets
wget -i urls.txtDownload multiple URLs
wget --limit-rate=200k URLLimit speed
wget -t 5 --wait=10 URLRetry with wait
wget --user=user --ask-password URLAuthenticated download
wget -qO- URL | commandPipe output
wget --spider URLCheck if URL exists

Conclusion

wget is an indispensable tool for automated downloading and web content retrieval:

Key Points Summary

  1. Non-interactive: Perfect for scripts and automation
  2. Resume capability: Continue interrupted downloads with -c
  3. Recursive downloads: Mirror entire websites with -r and -m
  4. Bandwidth control: Limit speed with --limit-rate
  5. Authentication: Support for HTTP/FTP auth
  6. Proxy support: Works with HTTP proxies
  7. Robust error handling: Retry mechanisms and timeouts

Best Practices

  1. Use -c for large downloads - Resume if interrupted
  2. Implement rate limiting - Be respectful to servers
  3. Use --wait and --random-wait - Avoid overwhelming servers
  4. Set appropriate timeouts - Prevent hanging downloads
  5. Log downloads - Track what was downloaded
  6. Verify SSL certificates - Ensure secure downloads
  7. Use -i for batch downloads - Maintain URL lists
  8. Test with --spider first - Verify URLs before downloading

Quick Reference

Want to…Command
Download a filewget URL
Resume downloadwget -c URL
Limit speedwget --limit-rate=200k URL
Download recursivelywget -r URL
Mirror websitewget -m URL
Download with authenticationwget --user=name --ask-password URL
Download multiplewget -i urls.txt
Check URLwget --spider URL
Pipe to commandwget -qO- URL | command
Ignore SSLwget --no-check-certificate URL

wget's versatility and robustness make it the go-to tool for automated downloads, website mirroring, and content retrieval in scripts and command-line operations.

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper