Introduction to du
The du (disk usage) command is a standard Unix utility used to estimate file and directory space usage. It's essential for understanding how disk space is being consumed, identifying large files and directories, and managing storage effectively.
Key Concepts
- Disk Usage: Reports the space used by files and directories
- Recursive: By default, examines directories recursively
- Block Size: Reports in blocks (usually 512-byte or 1024-byte blocks)
- Apparent Size: Can report logical vs. actual disk usage
- Hard Links: Handles hard links appropriately (counts only once)
1. Basic Usage
Simple du Commands
# Basic disk usage of current directory du # Disk usage with human-readable sizes du -h # Disk usage of specific directory du /home/user du /var/log # Show only total of current directory du -s du -sh # Total in human-readable format # Disk usage of multiple directories du /home /var /tmp # Examples $ du -sh Documents # 2.5G Documents $ du -h --max-depth=1 /usr # 1.2G /usr/bin # 500M /usr/lib # 200M /usr/share # 2.1G /usr
Common Use Cases
# Check home directory usage du -sh ~ # Find largest directories in current location du -sh */ | sort -rh # Check disk usage of hidden files du -sh .[!.]* 2>/dev/null # Summarize each directory in /var sudo du -sh /var/* # Check specific file sizes du -sh file.txt du -sh *.log # Compare directory sizes du -sb dir1 dir2 # Compare exact bytes
2. Command Options
-h (Human Readable)
# Human-readable sizes (K, M, G) du -h # Shows sizes in appropriate units du -sh # Total in human-readable du -h --max-depth=1 # First level with readable sizes # Examples $ du -sh * # 4.0K file.txt # 2.1M image.jpg # 1.2G video.mp4 # 500M archive.zip # Different unit scales du --si # Use powers of 1000 (KB, MB, GB) du -h # Use powers of 1024 (KiB, MiB, GiB)
-s (Summarize)
# Show only total for each argument du -s * # Size of each file/dir in current dir du -s /home/* # Summary of home directories # With human-readable du -sh /var/* # Human-readable summaries # Practical examples # Find largest users in home du -sh /home/* | sort -rh # Quick backup size estimate du -sh /home/user/Documents # Check specific directory total du -s --apparent-size /path
-a (All Files)
# Show all files, not just directories du -a # Shows all files and directories du -ah # All files with human-readable sizes # Find large files du -ah /home | sort -rh | head -20 # Count all files sizes du -ah --max-depth=1 | grep -E '\.[0-9]+[MG]' # Example output $ du -ah --max-depth=1 # 4.0K ./file1.txt # 2.1M ./image.jpg # 4.0K ./dir1 # 2.1M ./dir2 # 2.1M ./dir3
-c (Total)
# Display grand total du -ch *.log # Show each log and total du -csh /home/* # Total of all home directories # Example $ du -ch *.txt # 4.0K file1.txt # 8.0K file2.txt # 2.0K file3.txt # 14K total # Practical use in scripts total=$(du -ch /var/log/*.log | grep total$ | cut -f1) echo "Total log size: $total"
--max-depth
# Limit directory recursion depth
du -h --max-depth=1 # Show only immediate subdirectories
du -h --max-depth=2 # Show two levels deep
# Examples
# Show only top-level directories
du -h --max-depth=1 /usr
# Show two levels, useful for analysis
du -h --max-depth=2 /var | sort -rh
# Exclude deep nesting in reports
analyze_space() {
local dir="$1"
local depth="${2:-2}"
du -h --max-depth="$depth" "$dir" | sort -rh
}
--exclude and --exclude-from
# Exclude patterns
du -sh --exclude="*.log" /var
du -sh --exclude={*.mp3,*.mp4} ~/Downloads
# Multiple excludes
du -sh --exclude="*.cache" --exclude="*.tmp" ~
# Exclude from file
cat > exclude.txt << EOF
*.log
*.tmp
.cache
node_modules
EOF
du -sh --exclude-from=exclude.txt ~
# Examples
# Backup script exclude common patterns
backup_size() {
du -sh --exclude={*.bak,*.tmp,.git,node_modules} "$1"
}
# Find space used by non-excluded files
real_usage() {
local dir="$1"
shift
du -sh --exclude={.cache,.local,.config} "$dir"
}
--apparent-size
# Show logical size (not actual disk usage)
du --apparent-size -sh file
# Compare with actual disk usage
echo "Apparent size: $(du --apparent-size -sh file)"
echo "Disk usage: $(du -sh file)"
# Useful for sparse files and compression
compare_sizes() {
local file="$1"
local apparent=$(du --apparent-size -b "$file" | cut -f1)
local actual=$(du -b "$file" | cut -f1)
local ratio=$(echo "scale=2; $apparent / $actual" | bc)
echo "Space efficiency: $ratio ($apparent vs $actual bytes)"
}
3. Practical Examples
Finding Large Files and Directories
#!/bin/bash
# Find largest directories
find_large_dirs() {
local dir="${1:-.}"
local count="${2:-10}"
echo "Largest $count directories in $dir:"
du -h --max-depth=1 "$dir" 2>/dev/null | sort -rh | head -"$count"
}
# Find largest files
find_large_files() {
local dir="${1:-.}"
local size="${2:-100M}"
find "$dir" -type f -size +"$size" -exec du -sh {} \; 2>/dev/null | sort -rh
}
# Find recently modified large files
find_recent_large() {
local dir="${1:-.}"
local days="${2:-7}"
local size="${3:-10M}"
find "$dir" -type f -mtime -"$days" -size +"$size" -exec du -sh {} \; 2>/dev/null | sort -rh
}
# Human-readable size comparison
compare_sizes() {
local file1="$1"
local file2="$2"
size1=$(du -sb "$file1" | cut -f1)
size2=$(du -sb "$file2" | cut -f1)
if [ "$size1" -gt "$size2" ]; then
echo "$file1 is larger"
elif [ "$size2" -gt "$size1" ]; then
echo "$file2 is larger"
else
echo "Files are the same size"
fi
}
# Usage examples
# find_large_dirs /home 15
# find_large_files /var 500M
# find_recent_large /home 30 100M
Disk Usage Analysis
#!/bin/bash
# Comprehensive disk usage report
disk_report() {
local dir="${1:-.}"
local outfile="${2:-disk_report_$(date +%Y%m%d).txt}"
{
echo "Disk Usage Report - $(date)"
echo "================================"
echo
echo "Analyzing: $dir"
echo
echo "Top 20 largest directories:"
echo "--------------------------"
du -h --max-depth=3 "$dir" 2>/dev/null | sort -rh | head -20
echo
echo "Top 20 largest files:"
echo "---------------------"
find "$dir" -type f -exec du -h {} \; 2>/dev/null | sort -rh | head -20
echo
echo "Summary by file type:"
echo "---------------------"
find "$dir" -type f 2>/dev/null | while read file; do
ext="${file##*.}"
if [ "$ext" != "$file" ]; then
du -b "$file" 2>/dev/null
fi
done | awk '{size[$2]+=$1} END {for (ext in size) printf "%s: %.2f MB\n", ext, size[ext]/1048576}' | sort -rh -k2
echo
echo "Total size:"
du -sh "$dir"
} > "$outfile"
echo "Report saved to $outfile"
cat "$outfile"
}
# Monitor directory growth
monitor_growth() {
local dir="$1"
local interval="${2:-3600}"
local log="${3:-growth.log}"
while true; do
size=$(du -sb "$dir" | cut -f1)
echo "$(date +%s) $size" >> "$log"
echo "$(date): $(du -sh "$dir" | cut -f1)"
sleep "$interval"
done
}
# Compare two directories
compare_dirs() {
local dir1="$1"
local dir2="$2"
echo "Comparing:"
echo " $dir1"
echo " $dir2"
echo
size1=$(du -sb "$dir1" | cut -f1)
size2=$(du -sb "$dir2" | cut -f1)
echo "Size 1: $(numfmt --to=iec "$size1")"
echo "Size 2: $(numfmt --to=iec "$size2")"
if [ "$size1" -gt "$size2" ]; then
diff=$((size1 - size2))
echo "Directory 1 is larger by $(numfmt --to=iec "$diff")"
elif [ "$size2" -gt "$size1" ]; then
diff=$((size2 - size1))
echo "Directory 2 is larger by $(numfmt --to=iec "$diff")"
else
echo "Directories are the same size"
fi
}
Cleanup Scripts
#!/bin/bash
# Find and optionally remove large old files
cleanup_old_files() {
local dir="$1"
local days="${2:-30}"
local size="${3:-100M}"
local action="${4:-list}" # list, delete, or move
echo "Finding files in $dir older than $days days larger than $size"
find "$dir" -type f -mtime +"$days" -size +"$size" -print0 | while IFS= read -r -d '' file; do
filesize=$(du -sh "$file" | cut -f1)
filedate=$(stat -c %y "$file" | cut -d. -f1)
case "$action" in
list)
echo "$filedate - $filesize - $file"
;;
delete)
echo "Deleting: $file"
rm -f "$file"
;;
move)
mkdir -p /tmp/cleanup
mv -v "$file" /tmp/cleanup/
;;
esac
done
}
# Clean temporary directories
clean_temps() {
local dirs=("/tmp" "/var/tmp" "$HOME/.cache")
for dir in "${dirs[@]}"; do
if [ -d "$dir" ]; then
echo "Cleaning $dir"
du -sh "$dir"
find "$dir" -type f -atime +7 -delete 2>/dev/null
find "$dir" -type d -empty -delete 2>/dev/null
du -sh "$dir"
fi
done
}
# Remove duplicate files (by size first, then content)
find_duplicates() {
local dir="${1:-.}"
# Find files with same size
find "$dir" -type f -exec du -b {} \; 2>/dev/null | sort -n |
awk '{if ($1 == last) print last_file "\n" $2; else {last=$1; last_file=$2}}' |
while read file; do
if [ -f "$file" ]; then
echo "Possible duplicate: $file"
fi
done
}
# Archive and compress old logs
archive_logs() {
local logdir="${1:-/var/log}"
local days="${2:-30}"
find "$logdir" -name "*.log" -mtime +"$days" -print0 | while IFS= read -r -d '' log; do
echo "Archiving: $log"
gzip -9 "$log"
done
}
4. Scripting with du
Disk Usage Functions
#!/bin/bash
# Get size in bytes
get_size_bytes() {
local path="$1"
du -sb "$path" 2>/dev/null | cut -f1
}
# Get size in human-readable format
get_size_human() {
local path="$1"
du -sh "$path" 2>/dev/null | cut -f1
}
# Check if directory exceeds threshold
check_threshold() {
local path="$1"
local threshold="$2" # e.g., "1G"
local size=$(du -sb "$path" 2>/dev/null | cut -f1)
local threshold_bytes=$(numfmt --from=iec "$threshold")
if [ "$size" -gt "$threshold_bytes" ]; then
return 0 # Exceeds threshold
else
return 1 # Below threshold
fi
}
# Get percentage of disk usage
disk_usage_percent() {
local path="$1"
df "$path" | awk 'NR==2 {print $5}' | sed 's/%//'
}
# Example usage
if check_threshold /var/log "500M"; then
echo "Warning: /var/log exceeds 500M"
fi
Space Monitoring Script
#!/bin/bash
# Monitor disk space and send alerts
monitor_space() {
local threshold="${1:-90}" # Alert at 90%
local email="${2:-admin@localhost}"
local log="/var/log/space_monitor.log"
df -h | grep -vE '^Filesystem|tmpfs|cdrom' | while read line; do
usage=$(echo "$line" | awk '{print $5}' | sed 's/%//')
fs=$(echo "$line" | awk '{print $1}')
mount=$(echo "$line" | awk '{print $6}')
if [ "$usage" -ge "$threshold" ]; then
message="WARNING: Filesystem $fs mounted on $mount is at ${usage}% capacity"
echo "$(date): $message" >> "$log"
# Send alert
echo "$message" | mail -s "Disk Space Alert" "$email"
# Find largest directories
echo "Largest directories:" >> "$log"
du -sh "$mount"/* 2>/dev/null | sort -rh | head -10 >> "$log"
fi
done
}
# Continuous monitoring daemon
space_daemon() {
local check_interval="${1:-300}" # Check every 5 minutes
while true; do
monitor_space
sleep "$check_interval"
done
}
Backup Size Estimation
#!/bin/bash
# Estimate backup size
estimate_backup() {
local source="$1"
local exclude_file="$2"
echo "Estimating backup size for $source"
echo "================================"
# Base size
echo -n "Total size: "
if [ -n "$exclude_file" ]; then
du -sh --exclude-from="$exclude_file" "$source"
else
du -sh "$source"
fi
# Breakdown by type
echo -e "\nBreakdown by file type:"
find "$source" -type f 2>/dev/null | while read file; do
ext="${file##*.}"
echo "$ext"
done | sort | uniq -c | sort -rn | head -10
# Compression estimate
echo -e "\nEstimated compressed sizes:"
echo " tar.gz: $(estimate_compression "$source" gz)"
echo " tar.bz2: $(estimate_compression "$source" bz2)"
echo " zip: $(estimate_compression "$source" zip)"
}
# Rough compression estimate
estimate_compression() {
local dir="$1"
local type="$2"
local total=$(du -sb "$dir" | cut -f1)
case "$type" in
gz) ratio=0.3 ;; # Rough estimate for text
bz2) ratio=0.25 ;;
zip) ratio=0.35 ;;
*) ratio=0.5 ;;
esac
echo "$(numfmt --to=iec $(echo "$total * $ratio" | bc))"
}
5. Advanced Techniques
Sorting and Filtering
#!/bin/bash
# Sort by size (largest first)
du -h | sort -rh
# Sort by size numerically
du -b | sort -n
# Filter by size
du -h | grep -E '^[0-9.]+[MG]' # Only MB and GB
du -h | grep -E '^[0-9.]+G' # Only GB
# Show only directories above threshold
du -h --threshold=1G /home
# Custom sorting function
sort_by_size() {
local dir="${1:-.}"
local type="${2:-dirs}" # dirs, files, or all
case "$type" in
dirs)
du -h --max-depth=1 "$dir" 2>/dev/null | sort -rh
;;
files)
find "$dir" -maxdepth 1 -type f -exec du -h {} \; 2>/dev/null | sort -rh
;;
all)
du -ah --max-depth=1 "$dir" 2>/dev/null | sort -rh
;;
esac
}
Output Formatting
#!/bin/bash
# Format du output as JSON
du_to_json() {
local dir="${1:-.}"
echo "{"
du -b --max-depth=1 "$dir" 2>/dev/null | sort -rn | head -20 | while read size name; do
name=$(basename "$name")
echo " \"$name\": $size,"
done | sed '$s/,//'
echo "}"
}
# Format as CSV
du_to_csv() {
local dir="${1:-.}"
echo "Size,Bytes,Path"
du -b --max-depth=1 "$dir" 2>/dev/null | sort -rn | while read size path; do
human=$(numfmt --to=iec "$size")
echo "$human,$size,$path"
done
}
# Create bar chart
du_bar_chart() {
local dir="${1:-.}"
local max_width="${2:-50}"
# Find max size
max_size=$(du -b --max-depth=1 "$dir" 2>/dev/null | sort -rn | head -1 | cut -f1)
du -h --max-depth=1 "$dir" 2>/dev/null | sort -rh | while read size path; do
bytes=$(du -b "$path" 2>/dev/null | cut -f1)
width=$((bytes * max_width / max_size))
bar=$(printf "%${width}s" | tr ' ' '█')
printf "%-30s %8s %s\n" "$(basename "$path")" "$size" "$bar"
done
}
Integration with find
#!/bin/bash
# Find large files with detailed info
find_large_with_details() {
local dir="$1"
local size="${2:-10M}"
find "$dir" -type f -size +"$size" -exec sh -c '
for file; do
size=$(du -sh "$file" | cut -f1)
modified=$(stat -c %y "$file" 2>/dev/null | cut -d. -f1)
owner=$(stat -c %U "$file" 2>/dev/null)
echo "$size|$modified|$owner|$file"
done
' _ {} + | sort -rh | column -t -s '|'
}
# Find directories by size range
find_by_size_range() {
local dir="$1"
local min="$2"
local max="$3"
find "$dir" -type d -exec du -b {} \; 2>/dev/null |
awk -v min="$min" -v max="$max" '$1 >= min && $1 <= max' |
sort -n |
while read size path; do
echo "$(numfmt --to=iec "$size") $path"
done
}
# Find sparse files
find_sparse_files() {
local dir="${1:-.}"
find "$dir" -type f -exec sh -c '
apparent=$(du --apparent-size -b "$1" | cut -f1)
actual=$(du -b "$1" | cut -f1)
if [ "$apparent" -gt "$actual" ]; then
ratio=$(echo "scale=2; $actual / $apparent" | bc)
echo "$(du -sh "$1" | cut -f1) $1 (sparse: $ratio)"
fi
' _ {} \;
}
6. Performance Optimization
Efficient Scanning
#!/bin/bash
# Time du operations
time_du() {
local dir="${1:-.}"
echo "Timing du on $dir"
echo "=================="
for depth in 1 2 3; do
echo -n "Depth $depth: "
time du --max-depth="$depth" "$dir" > /dev/null 2>&1
done
}
# Compare du vs find performance
compare_performance() {
local dir="${1:-.}"
echo "Performance comparison on $dir"
echo "=============================="
echo -n "du -sh: "
time du -sh "$dir" > /dev/null 2>&1
echo -n "find -exec du: "
time find "$dir" -type f -exec du -b {} \; > /dev/null 2>&1
echo -n "find | xargs du: "
time find "$dir" -type f -print0 | xargs -0 du -bc > /dev/null 2>&1
}
# Optimize for many files
fast_du() {
local dir="$1"
# Use xargs for parallel processing
find "$dir" -type d -print0 | xargs -0 -P 4 -I {} du -sb {} > /tmp/du.$$
# Aggregate results
awk '{sum+=$1} END {printf "Total: %.2f GB\n", sum/1073741824}' /tmp/du.$$
rm /tmp/du.$$
}
Caching Results
#!/bin/bash
# Cache du results
cache_du() {
local dir="$1"
local cache_file="/tmp/du_cache_$(echo "$dir" | sed 's/\//_/g')"
local max_age="${2:-3600}" # 1 hour default
# Check if cache is valid
if [ -f "$cache_file" ] && [ $(( $(date +%s) - $(stat -c %Y "$cache_file") )) -lt "$max_age" ]; then
echo "Using cached data"
cat "$cache_file"
else
echo "Scanning directory..."
du -h "$dir" > "$cache_file"
cat "$cache_file"
fi
}
# Incremental update of cache
update_cache() {
local dir="$1"
local cache_file="/tmp/du_cache_$(echo "$dir" | sed 's/\//_/g')"
# Get current total
current=$(du -sb "$dir" | cut -f1)
# Update cache if exists
if [ -f "$cache_file" ]; then
echo "Updating cache..."
echo "$(date +%s) $current" >> "$cache_file"
else
cache_du "$dir"
fi
}
7. Visualization
Text-based Visualizations
#!/bin/bash
# Create a simple treemap-like visualization
tree_map() {
local dir="${1:-.}"
local max_width="${2:-60}"
du -h --max-depth=2 "$dir" 2>/dev/null | sort -rh | head -20 | while read size path; do
depth=$(echo "$path" | tr -cd '/' | wc -c)
indent=$((depth * 2))
printf "%${indent}s %-30s %8s\n" "" "$(basename "$path")" "$size"
done
}
# Create a pie chart-like display
pie_chart() {
local dir="${1:-.}"
local max_items="${2:-10}"
# Get top directories
du -b --max-depth=1 "$dir" 2>/dev/null | sort -rn | head -"$max_items" > /tmp/pie.$$
total=$(awk '{sum+=$1} END {print sum}' /tmp/pie.$$)
echo "Disk Usage Distribution"
echo "======================"
while read size name; do
percent=$(echo "scale=2; $size * 100 / $total" | bc)
bar_length=$(echo "$percent / 2" | bc)
bar=$(printf "%${bar_length}s" | tr ' ' '█')
printf "%-20s %5.1f%% %s\n" "$(basename "$name")" "$percent" "$bar"
done < /tmp/pie.$$
rm /tmp/pie.$$
}
# Color-coded output
color_du() {
du -h "$@" | while read size path; do
case "$size" in
*G) color='\033[0;31m' ;; # Red for GB
*M) color='\033[0;33m' ;; # Yellow for MB
*K) color='\033[0;32m' ;; # Green for KB
*) color='\033[0m' ;;
esac
echo -e "$color$size\t$path\033[0m"
done
}
Integration with Graphing Tools
#!/bin/bash
# Generate data for gnuplot
du_for_gnuplot() {
local dir="${1:-.}"
local out="${2:-du_data.txt}"
du -b --max-depth=1 "$dir" 2>/dev/null | sort -rn | head -20 > "$out"
cat > plot.gnu << EOF
set terminal png
set output 'du_graph.png'
set title 'Disk Usage'
set style data histogram
set style fill solid
set xlabel 'Directory'
set ylabel 'Size (bytes)'
plot '$out' using 1:xtic(2) notitle
EOF
gnuplot plot.gnu
echo "Graph saved as du_graph.png"
}
# Create HTML report with charts
create_html_report() {
local dir="${1:-.}"
local out="${2:-du_report.html}"
{
cat << EOF
<html>
<head>
<title>Disk Usage Report</title>
<style>
body { font-family: Arial, sans-serif; margin: 20px; }
.bar { background-color: #4CAF50; height: 20px; margin: 2px; }
.container { width: 800px; }
</style>
</head>
<body>
<h1>Disk Usage Report: $dir</h1>
<p>Generated: $(date)</p>
<div class="container">
EOF
# Get top 20 directories
du -b --max-depth=1 "$dir" 2>/dev/null | sort -rn | head -20 |
while read size name; do
size_mb=$(echo "scale=2; $size / 1048576" | bc)
human=$(numfmt --to=iec "$size")
width=$(echo "$size * 100 / $(head -1 | cut -f1)" | bc)
echo "<div>"
echo "<b>$(basename "$name")</b> - $human"
echo "<div class='bar' style='width: ${width}%'></div>"
echo "</div>"
done
cat << EOF
</div>
</body>
</html>
EOF
} > "$out"
echo "HTML report saved to $out"
}
8. Integration with Other Commands
With sort and uniq
#!/bin/bash
# Find largest files by type
largest_by_type() {
local dir="${1:-.}"
find "$dir" -type f -exec du -b {} \; 2>/dev/null |
awk '{print $2}' |
while read file; do
ext="${file##*.}"
size=$(du -b "$file" 2>/dev/null | cut -f1)
echo "$size $ext"
done |
sort -rn |
awk '{sum[$2]+=$1; count[$2]++} END {
for (ext in sum) {
printf "%s: %d files, %s\n", ext, count[ext],
system("numfmt --to=iec " sum[ext])
}
}' | sort -rh
}
# Group by size ranges
size_distribution() {
local dir="${1:-.}"
find "$dir" -type f -exec du -b {} \; 2>/dev/null | cut -f1 | awk '
BEGIN {
ranges[1] = "0-1K"
ranges[2] = "1K-10K"
ranges[3] = "10K-100K"
ranges[4] = "100K-1M"
ranges[5] = "1M-10M"
ranges[6] = "10M-100M"
ranges[7] = "100M-1G"
ranges[8] = ">1G"
}
{
if ($1 < 1024) count[1]++
else if ($1 < 10240) count[2]++
else if ($1 < 102400) count[3]++
else if ($1 < 1048576) count[4]++
else if ($1 < 10485760) count[5]++
else if ($1 < 104857600) count[6]++
else if ($1 < 1073741824) count[7]++
else count[8]++
}
END {
for (i=1; i<=8; i++) {
printf "%-10s %5d files\n", ranges[i], count[i]
}
}
'
}
With grep and awk
#!/bin/bash
# Find directories with many small files
find_small_files_clusters() {
local dir="${1:-.}"
local threshold="${2:-100}"
find "$dir" -type d -exec sh -c '
count=$(find "$1" -type f -size -10k | wc -l)
if [ "$count" -gt "$2" ]; then
echo "$count small files in $1"
fi
' _ {} "$threshold" \;
}
# Calculate average file size
avg_file_size() {
local dir="${1:-.}"
find "$dir" -type f -exec du -b {} \; 2>/dev/null | awk '
{sum+=$1; count++}
END {
avg = sum/count
printf "Total files: %d\n", count
printf "Total size: %s\n", system("numfmt --to=iec " sum)
printf "Average: %s\n", system("numfmt --to=iec " avg)
}
'
}
# Find duplicate sizes
find_duplicate_sizes() {
local dir="${1:-.}"
find "$dir" -type f -exec du -b {} \; 2>/dev/null | sort -n | uniq -d -w 20 |
while read size file; do
echo "Files of size $size bytes:"
find "$dir" -type f -size "${size}c" -exec ls -lh {} \; 2>/dev/null
echo
done
}
With df for complete picture
#!/bin/bash
# Complete disk analysis
complete_analysis() {
local mount_point="${1:-/}"
echo "=== Complete Disk Analysis for $mount_point ==="
echo
# Filesystem overview
echo "Filesystem Overview:"
df -h "$mount_point"
echo
# Top-level directories
echo "Top-level directory usage:"
du -sh --exclude={proc,sys,dev,run} "$mount_point"/* 2>/dev/null | sort -rh | head -10
echo
# Largest directories deep dive
echo "Deep dive - largest directories (depth 2):"
du -h --max-depth=2 --exclude={proc,sys,dev,run} "$mount_point" 2>/dev/null | sort -rh | head -20
echo
# Largest files
echo "Largest files:"
find "$mount_point" -type f -size +100M -exec ls -lh {} \; 2>/dev/null | sort -rh -k5 | head -10
echo
# Summary
echo "Summary by file type:"
find "$mount_point" -type f 2>/dev/null | while read file; do
ext="${file##*.}"
if [ "$ext" != "$file" ]; then
du -b "$file" 2>/dev/null
fi
done | awk '{size[$2]+=$1} END {for (ext in size) printf "%s: %s\n", ext, system("numfmt --to=iec " size[ext])}' | sort -rh
}
# Check space and send alert
check_space_and_report() {
local threshold="${1:-90}"
local email="${2:-admin@localhost}"
df -h | grep -vE '^Filesystem|tmpfs' | while read line; do
usage=$(echo "$line" | awk '{print $5}' | sed 's/%//')
mount=$(echo "$line" | awk '{print $6}')
if [ "$usage" -gt "$threshold" ]; then
report="/tmp/space_alert_${mount//\//_}.txt"
{
echo "Disk Space Alert - $(date)"
echo "Mount point: $mount"
echo "Usage: ${usage}%"
echo
echo "Largest directories:"
du -sh "$mount"/* 2>/dev/null | sort -rh | head -20
echo
echo "Largest files:"
find "$mount" -type f -size +100M -exec ls -lh {} \; 2>/dev/null | sort -rh -k5 | head -20
} > "$report"
mail -s "Disk Space Alert: $mount at ${usage}%" -a "$report" "$email" < /dev/null
rm "$report"
fi
done
}
9. Error Handling and Edge Cases
Robust Error Handling
#!/bin/bash
# Safe du with error handling
safe_du() {
local path="$1"
shift
if [ ! -e "$path" ]; then
echo "Error: Path does not exist: $path" >&2
return 1
fi
if [ ! -r "$path" ]; then
echo "Error: Permission denied reading: $path" >&2
return 1
fi
# Use subshell to handle permission errors gracefully
(
du "$@" "$path" 2>/dev/null ||
echo "Warning: Some files in $path could not be accessed" >&2
)
}
# Retry on transient errors
retry_du() {
local path="$1"
local max_attempts="${2:-3}"
local attempt=1
while [ $attempt -le $max_attempts ]; do
if du -sh "$path" 2>/dev/null; then
return 0
fi
echo "Attempt $attempt failed, retrying..." >&2
attempt=$((attempt + 1))
sleep 2
done
echo "Failed to get disk usage after $max_attempts attempts" >&2
return 1
}
# Handle broken symlinks
du_with_symlinks() {
local path="$1"
du -shL "$path" 2>/dev/null || {
echo "Warning: Some symlinks could not be resolved" >&2
du -sh "$path" 2>/dev/null
}
}
Edge Cases
#!/bin/bash
# Handle files with special characters
handle_special_chars() {
local dir="$1"
# Use null separator for safety
find "$dir" -type f -print0 | xargs -0 du -b 2>/dev/null
}
# Handle very deep directory structures
deep_du() {
local dir="$1"
# Limit recursion depth
du -h --max-depth=10 "$dir" 2>/dev/null
}
# Handle files with spaces, newlines, etc.
safe_list() {
local dir="$1"
# Use null-terminated output
find "$dir" -type f -print0 | while IFS= read -r -d '' file; do
size=$(du -b "$file" 2>/dev/null | cut -f1)
printf "%s\t%s\n" "$size" "$file"
done
}
# Handle network filesystems
network_du() {
local path="$1"
# Add timeout for network filesystems
timeout 30 du -sh "$path" 2>/dev/null || echo "Timeout: Network filesystem unresponsive"
}
10. Real-World Applications
System Maintenance Script
#!/bin/bash
# Comprehensive system maintenance script
maintain_system() {
local log="/var/log/maintenance_$(date +%Y%m%d).log"
{
echo "System Maintenance - $(date)"
echo "=============================="
echo
# Check disk usage
echo "Disk Usage Overview:"
df -h
echo
# Find large files in home directories
echo "Large files in /home:"
find /home -type f -size +100M -exec ls -lh {} \; 2>/dev/null | sort -rh -k5 | head -20
echo
# Check log sizes
echo "Log file sizes:"
du -sh /var/log/* 2>/dev/null | sort -rh | head -20
echo
# Find old files in /tmp
echo "Old files in /tmp:"
find /tmp -type f -atime +7 -exec du -sh {} \; 2>/dev/null | sort -rh | head -20
echo
# Package manager cache
if command -v apt-get &>/dev/null; then
echo "APT cache size:"
du -sh /var/cache/apt/archives
fi
# Docker cleanup suggestions
if command -v docker &>/dev/null; then
echo "Docker disk usage:"
docker system df
fi
} | tee -a "$log"
echo "Maintenance log saved to $log"
}
# Cleanup suggestions
suggest_cleanup() {
local dir="${1:-/home}"
echo "Cleanup Suggestions for $dir"
echo "============================"
# Suggest large directories to investigate
echo "Large directories to investigate:"
du -h --max-depth=2 "$dir" 2>/dev/null | sort -rh | head -10
echo
# Suggest old files to remove
echo "Old large files (last accessed > 1 year ago):"
find "$dir" -type f -atime +365 -size +10M -exec ls -lh {} \; 2>/dev/null | sort -rh -k5 | head -10
echo
# Suggest duplicate files
echo "Potential duplicate files:"
find_duplicate_sizes "$dir" | head -20
}
User Quota Management
#!/bin/bash
# Check user quotas
check_user_quotas() {
local threshold="${1:-1G}"
local threshold_bytes=$(numfmt --from=iec "$threshold")
echo "User Home Directory Usage"
echo "========================"
for home in /home/*; do
if [ -d "$home" ]; then
user=$(basename "$home")
usage=$(du -sb "$home" 2>/dev/null | cut -f1)
usage_human=$(numfmt --to=iec "$usage")
if [ "$usage" -gt "$threshold_bytes" ]; then
printf "%-15s %10s (EXCEEDS %s)\n" "$user" "$usage_human" "$threshold"
else
printf "%-15s %10s\n" "$user" "$usage_human"
fi
fi
done
}
# Generate quota report for all users
quota_report() {
local report="/tmp/quota_report_$(date +%Y%m%d).txt"
{
echo "User Quota Report - $(date)"
echo "============================"
echo
# Header
printf "%-20s %-15s %-15s %-15s %s\n" "User" "Used" "Quota" "Usage%" "Largest Directory"
echo "--------------------------------------------------------------------------------"
for home in /home/*; do
if [ -d "$home" ]; then
user=$(basename "$home")
used=$(du -sb "$home" 2>/dev/null | cut -f1)
used_human=$(numfmt --to=iec "$used")
# Get largest directory
largest=$(du -h --max-depth=2 "$home" 2>/dev/null | sort -rh | head -1)
printf "%-20s %-15s %-15s %5s%% %s\n" \
"$user" \
"$used_human" \
"10G" \
"$((used * 100 / 10737418240))" \
"$largest"
fi
done
} > "$report"
echo "Report saved to $report"
cat "$report"
}
Backup Verification
#!/bin/bash
# Verify backup integrity and size
verify_backup() {
local backup_path="$1"
local original_path="${2:-/home}"
echo "Backup Verification"
echo "==================="
# Check backup size
backup_size=$(du -sb "$backup_path" | cut -f1)
backup_size_human=$(numfmt --to=iec "$backup_size")
echo "Backup size: $backup_size_human"
# Compare with original
if [ -d "$original_path" ]; then
original_size=$(du -sb "$original_path" | cut -f1)
original_size_human=$(numfmt --to=iec "$original_size")
echo "Original size: $original_size_human"
# Size difference
if [ "$backup_size" -eq "$original_size" ]; then
echo "✓ Backup size matches original"
elif [ "$backup_size" -lt "$original_size" ]; then
diff=$((original_size - backup_size))
echo "⚠ Backup smaller by $(numfmt --to=iec "$diff")"
else
diff=$((backup_size - original_size))
echo "⚠ Backup larger by $(numfmt --to=iec "$diff")"
fi
fi
# Check for empty directories
echo -e "\nChecking for empty directories in backup:"
find "$backup_path" -type d -empty 2>/dev/null | head -10
}
# Compare two backups
compare_backups() {
local backup1="$1"
local backup2="$2"
echo "Comparing Backups"
echo "================="
size1=$(du -sb "$backup1" | cut -f1)
size2=$(du -sb "$backup2" | cut -f1)
echo "Backup 1: $(numfmt --to=iec "$size1")"
echo "Backup 2: $(numfmt --to=iec "$size2")"
if [ "$size1" -eq "$size2" ]; then
echo "✓ Backups are the same size"
else
diff=$((size1 - size2))
if [ "$diff" -gt 0 ]; then
echo "Backup 1 is larger by $(numfmt --to=iec "$diff")"
else
echo "Backup 2 is larger by $(numfmt --to=iec "$(( -diff ))")"
fi
fi
}
11. Best Practices and Tips
Performance Tips
#!/bin/bash
# Tips for faster du operations
# 1. Use --max-depth to limit recursion
time du -sh /home # Deep scan
time du -sh --max-depth=5 /home # Faster, limited depth
# 2. Use -S to separate directory sizes
du -shS /home # Don't include subdirectory sizes in parent
# 3. Use -b for raw bytes (faster, no conversion)
du -sb /home | numfmt --to=iec
# 4. Exclude unnecessary files
du -sh --exclude={.cache,.local,.config} ~
# 5. Use xargs for parallel processing
find /home -type d -print0 | xargs -0 -P 4 du -sb > /tmp/sizes
# 6. Cache results for frequently scanned directories
cache_du() {
local dir="$1"
local cache="/tmp/du_cache_$(echo "$dir" | md5sum | cut -d' ' -f1)"
if [ -f "$cache" ] && [ $(( $(date +%s) - $(stat -c %Y "$cache") )) -lt 3600 ]; then
cat "$cache"
else
du -sh "$dir" | tee "$cache"
fi
}
Accuracy Tips
#!/bin/bash
# Tips for accurate disk usage reporting
# 1. Account for block size
echo "Actual disk usage: $(du -sh file)"
echo "File size: $(ls -lh file | awk '{print $5}')"
# 2. Handle hard links correctly
# du counts hard-linked files only once
ln file.txt hardlink.txt
du -sh file.txt hardlink.txt # Shows same size, not double
# 3. Use --apparent-size for logical size
du --apparent-size -sh compressed_file
# 4. Check for mount points
du -x / # Don't cross filesystem boundaries
# 5. Include hidden files
du -sh .[!.]* 2>/dev/null # All hidden files
# 6. Handle permissions
sudo du -sh /root # Use sudo for protected directories
Common Pitfalls
#!/bin/bash
# Pitfall 1: Not accounting for mount points
# Wrong - includes other filesystems
du -sh /
# Correct - stay on one filesystem
du -xsh /
# Pitfall 2: Forgetting about large hidden directories
# Wrong - misses hidden dirs
du -sh /home/*
# Correct - includes all
du -sh /home
# Pitfall 3: Not handling symlinks
# Wrong - follows symlinks (may double count)
du -Lsh /home
# Correct - treat symlinks normally
du -sh /home
# Pitfall 4: Running du on huge directories
# Better to limit depth
du -h --max-depth=3 huge_dir
# Pitfall 5: Not excluding special filesystems
# Better to exclude
du -sh --exclude={proc,sys,dev} /
12. Command Summary and Cheat Sheet
Quick Reference
# Basic commands du # Show disk usage of current directory du -h # Human-readable sizes du -s # Summary only du -a # All files, not just directories # Size specification du -b # Bytes du -k # Kilobytes du -m # Megabytes du -g # Gigabytes # Output control du -c # Grand total du -0 # Null-terminated output du -l # Count hard links multiple times # Directory traversal du --max-depth=N # Limit recursion depth du -x # Stay on one filesystem du -L # Follow symlinks # Filtering du --threshold=SIZE # Show only above threshold du --exclude=PATTERN # Exclude matching files du --time # Show last modification time # Sorting (pipe to sort) du -h | sort -rh # Sort human-readable du -b | sort -n # Sort by bytes
Common One-Liners
# Largest directories
du -h --max-depth=1 | sort -rh
# Largest files
find . -type f -exec du -h {} \; | sort -rh | head -20
# Directory sizes in MB
du -m --max-depth=1 | sort -rn
# Total size of files matching pattern
find . -name "*.log" -exec du -ch {} + | grep total$
# Size of current directory without subdirs
du -shS
# Top 10 largest users in /home
sudo du -sh /home/* | sort -rh | head -10
# Check if directory exceeds threshold
[ $(du -sb dir | cut -f1) -gt 1073741824 ] && echo "Over 1GB"
# Size of all files modified today
find . -type f -mtime 0 -exec du -ch {} + | grep total
# Summary by file extension
find . -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn
Conclusion
The du command is an essential tool for understanding and managing disk usage:
Key Takeaways
- Versatile Reporting: From simple summaries to detailed breakdowns
- Human-Readable:
-hoption makes sizes easy to understand - Depth Control:
--max-depthmanages recursion level - Filtering: Exclude patterns and threshold-based reporting
- Scripting: Easily integrated into automation scripts
- Performance: Various options for optimizing large scans
Best Practices Summary
| Scenario | Recommended Command |
|---|---|
| Quick directory size | du -sh dir |
| Top-level breakdown | du -h --max-depth=1 |
| Find large directories | du -h | sort -rh |
| Find large files | find . -type f -exec du -h {} \; | sort -rh |
| Total of all logs | du -ch *.log | grep total |
| Cross-filesystem | du -xsh / |
| Include all files | du -ah |
| With timestamp | du --time -sh |
The du command's flexibility and power make it indispensable for system administrators, developers, and anyone needing to understand disk space usage on Unix-like systems.