Introduction to sort
The sort command is a powerful utility in Unix/Linux systems that sorts lines of text files. It can organize data alphabetically, numerically, by fields, and in various orders. Understanding sort is essential for data processing, log analysis, and general text manipulation.
Basic Syntax
sort [options] [file...]
If no file is specified, sort reads from standard input.
1. Basic Usage
Simple Sorting
# Sort alphabetically (default) sort file.txt # Sort in reverse order sort -r file.txt # Sort multiple files together sort file1.txt file2.txt # Save sorted output to new file sort file.txt > sorted.txt # Sort and display with line numbers sort file.txt | nl
Examples
$ cat fruits.txt banana apple cherry date $ sort fruits.txt apple banana cherry date $ sort -r fruits.txt date cherry banana apple
2. Common Options
Output Control
# -o: Write output to file (safe for in-place sorting) sort -o sorted.txt file.txt # -u: Unique lines (remove duplicates) sort -u file.txt # -c: Check if file is already sorted sort -c file.txt && echo "File is sorted" # -C: Check silently (exit code only) if sort -C file.txt; then echo "File is sorted" fi
Sorting Order
# -f: Ignore case (fold lower case to upper case) sort -f file.txt # -d: Dictionary order (ignore non-alphanumeric) sort -d file.txt # -i: Ignore non-printable characters sort -i file.txt # -M: Month sort (Jan, Feb, etc.) sort -M dates.txt # -h: Human numeric sort (2K, 1G, etc.) sort -h sizes.txt # -V: Version sort (natural version numbers) sort -V versions.txt
Examples with Options
$ cat mixed.txt Apple banana CHERRY date $ sort -f mixed.txt # Case-insensitive Apple banana CHERRY date $ cat versions.txt v1.0 v1.10 v1.2 v2.0 $ sort -V versions.txt # Version sort v1.0 v1.2 v1.10 v2.0
3. Numeric Sorting
Numeric Options
# -n: Numeric sort sort -n numbers.txt # -g: General numeric sort (handles scientific notation) sort -g scientific.txt # Sort by numeric value in reverse sort -nr numbers.txt # Sort with specific numeric format sort -n --numeric-sort numbers.txt
Examples
$ cat numbers.txt 10 2 33 5 111 $ sort -n numbers.txt 2 5 10 33 111 $ sort -nr numbers.txt # Reverse numeric 111 33 10 5 2 $ cat mixed_numbers.txt 10 apples 2 bananas 33 cherries 5 dates $ sort -n mixed_numbers.txt # Sorts by first field numerically 2 bananas 5 dates 10 apples 33 cherries
4. Field-Based Sorting
Field Specification
# -k: Sort by key field (1-based) sort -k2 file.txt # Sort by field 2 # -t: Specify field separator (default: whitespace) sort -t',' -k2 data.csv # Multiple keys sort -k1,1 -k2n file.txt # Sort by field1, then numerically by field2 # Field ranges sort -k2,3 file.txt # Sort by fields 2 through 3
Field Options
# n: numeric sort for specific field sort -k2n file.txt # r: reverse for specific field sort -k1r -k2n file.txt # b: ignore leading blanks sort -k2b file.txt # f: fold case for field sort -k2f file.txt
Complex Field Examples
$ cat data.txt John,30,NY Alice,25,CA Bob,30,TX Carol,25,NY # Sort by age (field 2) numerically sort -t',' -k2n data.txt Alice,25,CA Carol,25,NY John,30,NY Bob,30,TX # Sort by age then name sort -t',' -k2n -k1 data.txt Alice,25,CA Carol,25,NY Bob,30,TX John,30,NY # Sort by state then age sort -t',' -k3,3 -k2n data.txt Alice,25,CA Carol,25,NY John,30,NY Bob,30,TX $ cat /etc/passwd | sort -t':' -k3n | head -3 # Sort users by UID root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin
5. Advanced Sorting Techniques
Stable Sorting
# -s: Stable sort (preserves original order of equal lines) sort -s file.txt # Compare with unstable sort sort -k2 file.txt # May reorder equal fields sort -s -k2 file.txt # Preserves order of equal fields
Merge Sorted Files
# -m: Merge already sorted files sort -m sorted1.txt sorted2.txt > merged.txt # Check if merge is valid sort -c sorted1.txt && sort -c sorted2.txt && sort -m sorted1.txt sorted2.txt
Random Sort
# -R: Random sort (shuffle) sort -R file.txt # With seed for reproducibility sort -R --random-source=/dev/urandom file.txt
Examples
$ cat data.txt apple,10 banana,5 apple,20 cherry,15 banana,10 $ sort -t',' -k1 -s data.txt # Stable sort by fruit apple,10 apple,20 banana,5 banana,10 cherry,15 $ sort -R data.txt | head -3 # Random order (different each time) banana,5 apple,20 cherry,15
6. Practical Examples
Log File Analysis
# Sort log entries by timestamp sort -k1,2 access.log # Find most frequent IP addresses cut -d' ' -f1 access.log | sort | uniq -c | sort -nr | head -10 # Sort errors by count grep "ERROR" app.log | cut -d']' -f2 | sort | uniq -c | sort -nr # Sort by date in custom format sort -t' ' -k1.9,1.12n -k1.5,1.7M -k1.2,1.4n logfile.txt # Sorts by: year, month, day in format "10 Mar 2024"
Data Processing
# Sort CSV by column sort -t',' -k3n data.csv > sorted_data.csv # Sort with header preservation (head -1 data.csv && tail -n +2 data.csv | sort -k2) > sorted.csv # Remove duplicates after sorting sort -u file.txt > unique.txt # Sort by file size ls -l | sort -k5n # Sort processes by memory usage ps aux | sort -k4nr | head -10
Text Processing
# Sort words in a sentence
echo "the quick brown fox" | tr ' ' '\n' | sort | tr '\n' ' '
# Sort paragraphs (using blank lines as record separators)
tr '\n' '\0' < file.txt | sort -z | tr '\0' '\n'
# Sort by line length
awk '{print length, $0}' file.txt | sort -n | cut -d' ' -f2-
# Sort and number lines
sort file.txt | nl -w3 -s'. '
7. Script Examples
Log Analyzer Script
#!/bin/bash
# analyze_logs.sh - Comprehensive log analysis with sort
LOG_FILE="$1"
if [[ ! -f "$LOG_FILE" ]]; then
echo "Usage: $0 <logfile>"
exit 1
fi
echo "=== Log Analysis Report ==="
echo "File: $LOG_FILE"
echo "Date: $(date)"
echo
# Top IP addresses
echo "Top IP Addresses:"
grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" "$LOG_FILE" |
sort | uniq -c | sort -nr | head -10 |
awk '{printf " %-15s %d\n", $2, $1}'
echo
# Top requested URLs
echo "Top Requested URLs:"
grep -oE "\"(GET|POST|PUT|DELETE) [^\"]+\"" "$LOG_FILE" |
cut -d' ' -f2 | sort | uniq -c | sort -nr | head -10 |
awk '{printf " %-40s %d\n", $2, $1}'
echo
# Response codes
echo "Response Code Distribution:"
grep -oE "\" [0-9]{3} " "$LOG_FILE" |
awk '{print $2}' | sort | uniq -c | sort -nr |
awk '{printf " %s: %d\n", $2, $1}'
echo
# Busy hours
echo "Busy Hours:"
grep -oE ":[0-9]{2}:[0-9]{2}:[0-9]{2}" "$LOG_FILE" |
cut -d':' -f1 | sort | uniq -c | sort -nr |
awk '{printf " %02d:00 - %d requests\n", $2, $1}'
CSV Processor Script
#!/bin/bash
# process_csv.sh - Sort and analyze CSV files
CSV_FILE="$1"
SORT_COL="${2:-1}"
DELIMITER="${3:-,}"
if [[ ! -f "$CSV_FILE" ]]; then
echo "Usage: $0 <csv_file> [sort_column] [delimiter]"
exit 1
fi
# Get header
HEADER=$(head -1 "$CSV_FILE")
# Sort data (excluding header)
echo "=== Sorted CSV Data ==="
echo "$HEADER"
tail -n +2 "$CSV_FILE" | sort -t"$DELIMITER" -k"$SORT_COL"n
echo
echo "=== Column Statistics ==="
echo
# Calculate statistics for numeric columns
COLS=$(echo "$HEADER" | awk -F"$DELIMITER" '{print NF}')
for ((i=1; i<=COLS; i++)); do
echo "Column $i Statistics:"
tail -n +2 "$CSV_FILE" | cut -d"$DELIMITER" -f"$i" | {
# Check if column is numeric
if tail -n +2 "$CSV_FILE" | cut -d"$DELIMITER" -f"$i" | grep -qE '^[0-9]+(\.[0-9]+)?$'; then
echo " Type: Numeric"
echo " Min: $(sort -n | head -1)"
echo " Max: $(sort -n | tail -1)"
echo " Unique: $(sort -u | wc -l)"
else
echo " Type: Text"
echo " Unique values: $(sort -u | wc -l)"
echo " Top values:"
sort | uniq -c | sort -nr | head -3 |
while read count value; do
echo " $value: $count"
done
fi
}
echo
done
Duplicate Finder Script
#!/bin/bash
# find_duplicates.sh - Find and sort duplicate lines
find_duplicates() {
local file="$1"
local min_count="${2:-2}"
if [[ ! -f "$file" ]]; then
echo "File not found: $file"
return 1
fi
echo "Analyzing duplicates in: $file"
echo "Minimum duplicate count: $min_count"
echo
# Find duplicates with counts
sort "$file" | uniq -c | sort -nr | while read count line; do
if [[ $count -ge $min_count ]]; then
echo "$count: $line"
fi
done
}
# Find duplicate lines across multiple files
find_cross_file_duplicates() {
local pattern="$1"
cat $pattern | sort | uniq -c | sort -nr |
awk '$1 > 1 {print $0}'
}
case "$1" in
single)
find_duplicates "$2" "${3:-2}"
;;
multi)
find_cross_file_duplicates "$2"
;;
*)
echo "Usage: $0 {single|multi} <file/pattern> [min_count]"
;;
esac
8. Sorting with Pipes and Redirections
Common Pipelines
# Sort and count unique lines cat file.txt | sort | uniq -c | sort -nr # Sort and page through less sort largefile.txt | less # Sort and display with head/tail sort -nr data.txt | head -20 # Sort multiple files and remove duplicates sort -u file1.txt file2.txt file3.txt # Sort and split output sort hugefile.txt | split -l 10000 - sorted_chunk_
Advanced Pipelines
# Find most common words in a file tr -cs '[:alpha:]' '\n' < file.txt | sort | uniq -c | sort -nr | head -20 # Sort files by modification time ls -l | sort -k6,7M -k7n # Sort process tree ps auxf | sort -k3nr # Sort by CPU usage # Sort and join files sort file1.txt > sorted1.txt sort file2.txt > sorted2.txt join sorted1.txt sorted2.txt
9. Performance Considerations
Large File Handling
# Use temporary directory for large sorts sort -T /tmp/large -o sorted.txt hugefile.txt # Increase buffer size sort -S 50% hugefile.txt # Use 50% of memory # Parallel sorting (with GNU sort) sort --parallel=4 hugefile.txt # External sort for very large files sort -S 1G -T /fast/disk/tmp hugefile.txt
Optimization Tips
# Use LC_ALL for faster sorting LC_ALL=C sort file.txt # Avoid unnecessary pipes # Bad: cat file.txt | sort # Good: sort file.txt # Use -u instead of sort | uniq # Bad: sort file.txt | uniq # Good: sort -u file.txt # Sort only necessary fields cut -f1,2 file.txt | sort > sorted_fields.txt
10. Sorting Different Data Types
IP Addresses
# Sort IP addresses sort -t. -k1,1n -k2,2n -k3,3n -k4,4n ips.txt # Using version sort for IPs sort -V ips.txt # Sort IPs with counts cut -d' ' -f1 access.log | sort | uniq -c | sort -nr
Dates and Times
# Sort dates (YYYY-MM-DD)
sort -t'-' -k1n -k2n -k3n dates.txt
# Sort timestamps
sort -k1,2 logfile.txt
# Sort by month
sort -M -k2 data.txt
# Sort by day of week
awk '{print $1}' dates.txt |
while read date; do
date -d "$date" +"%u %s $date"
done | sort -n | cut -d' ' -f3-
Human-Readable Sizes
# Sort file sizes ls -lh | sort -h -k5 # Sort by size with units du -sh * | sort -h # Sort free memory free -h | sort -h -k3
11. Special Sorting Cases
Version Numbers
# Sort version numbers
sort -V versions.txt
# Complex version sorting
echo -e "1.0.0\n1.0.1\n1.1.0\n2.0.0-beta\n2.0.0" | sort -V
# Sort package versions
dpkg -l | grep ^ii | awk '{print $2"="$3}' | sort -V
Mixed Content
# Sort mixed alphanumeric sort -V mixed.txt # Sort with numeric prefix sort -n numbers.txt 2>/dev/null || sort -f mixed.txt # Sort ignoring leading symbols sort -b file.txt # Ignore leading blanks
Custom Sort Keys
# Sort by last field
awk '{print $NF, $0}' file.txt | sort | cut -d' ' -f2-
# Sort by word count
while read line; do
echo "$(echo "$line" | wc -w):$line"
done < file.txt | sort -n | cut -d':' -f2-
12. Error Handling
Common Errors
# No such file sort nonexistent.txt # Output: sort: cannot read: nonexistent.txt: No such file or directory # Permission denied sort /etc/shadow # Output: sort: cannot read: /etc/shadow: Permission denied # Invalid option sort -z file.txt # if -z not supported
Error Handling in Scripts
# Check if sort succeeded if sort -o output.txt input.txt 2>/dev/null; then echo "Sorting successful" else echo "Sorting failed" fi # Validate input before sorting if [[ -r "$file" ]]; then sort "$file" else echo "Cannot read $file" >&2 exit 1 fi # Check if file is empty if [[ -s "$file" ]]; then sort "$file" else echo "File is empty" >&2 fi
13. Comparison with Related Commands
| Command | Purpose | Example |
|---|---|---|
sort | Sort lines | sort file.txt |
uniq | Remove duplicates | uniq file.txt |
shuf | Random permutation | shuf file.txt |
comm | Compare sorted files | comm file1.txt file2.txt |
join | Join on common field | join file1.txt file2.txt |
tsort | Topological sort | tsort deps.txt |
Combined Usage
# Find common lines in sorted files sort file1.txt > sorted1.txt sort file2.txt > sorted2.txt comm -12 sorted1.txt sorted2.txt # Find lines unique to file1 comm -23 sorted1.txt sorted2.txt # Join on first field sort -k1 file1.txt > s1.txt sort -k1 file2.txt > s2.txt join -t',' -1 1 -2 1 s1.txt s2.txt
14. Environment Variables
LC_COLLATE
Controls sort order (alphabetical rules):
# Default locale sort export LC_COLLATE=en_US.UTF-8 sort file.txt # C locale (ASCII order) export LC_COLLATE=C sort file.txt # Spanish locale (ll treated as separate letter) export LC_COLLATE=es_ES.UTF-8 sort spanish.txt
TMPDIR
Temporary directory for large sorts:
export TMPDIR=/fast/disk/tmp sort -S 1G hugefile.txt
15. GNU Extensions
GNU sort Additional Features
# --debug: Show sorting decisions sort --debug file.txt # --parallel: Parallel sorting sort --parallel=4 hugefile.txt # --compress-program: Compress temporary files sort --compress-program=gzip hugefile.txt # --batch-size: Number of merges sort --batch-size=16 hugefile.txt # --random-source: Random source for -R sort -R --random-source=/dev/urandom file.txt
16. Practical Applications
Database-like Operations
# SELECT DISTINCT column FROM table ORDER BY column cut -d',' -f2 data.csv | sort -u # SELECT column, COUNT(*) GROUP BY column ORDER BY count DESC cut -d',' -f1 data.csv | sort | uniq -c | sort -nr # ORDER BY multiple columns sort -t',' -k2,2 -k3n data.csv # LIMIT with OFFSET sort -k3n data.csv | tail -n +10 | head -5
Data Cleaning
# Remove duplicates and sort sort -u messy.txt > clean.txt # Sort and remove blank lines grep -v '^$' file.txt | sort > cleaned.txt # Normalize whitespace and sort tr -s ' ' < file.txt | sort # Sort and number lines sort file.txt | nl -w3 -s'. '
System Administration
# Sort users by UID cut -d: -f1,3 /etc/passwd | sort -t: -k2n # Sort running processes by memory ps aux | sort -k4nr | head -10 # Sort mounted filesystems by size df -h | sort -k2hr # Sort network connections by state netstat -tuna | sort -k6
17. Benchmarking
Performance Testing
# Time sort operations time sort largefile.txt > /dev/null # Compare different sort strategies time sort -S 1G largefile.txt > /dev/null time sort -T /tmp -S 1G largefile.txt > /dev/null # Test with different locales time LC_ALL=C sort largefile.txt > /dev/null time LC_ALL=en_US.UTF-8 sort largefile.txt > /dev/null
Memory Usage
# Monitor memory during sort /usr/bin/time -v sort largefile.txt > /dev/null # Check temporary file usage lsof -c sort
18. Quick Reference Card
Most Common Options
| Option | Description |
|---|---|
-n | Numeric sort |
-r | Reverse order |
-u | Unique lines |
-k | Sort by key field |
-t | Field separator |
-o | Output file |
-c | Check if sorted |
-f | Ignore case |
-M | Month sort |
-h | Human numeric |
-V | Version sort |
-s | Stable sort |
-R | Random sort |
-m | Merge sorted files |
Field Specifications
| Syntax | Meaning |
|---|---|
-k2 | Sort by field 2 to end of line |
-k2,2 | Sort by field 2 only |
-k2,3 | Sort by fields 2 through 3 |
-k2n | Sort field 2 numerically |
-k2nr | Sort field 2 numerically in reverse |
-k2,2 -k3,3n | Sort by field 2, then numerically by field 3 |
Common Combinations
| Command | Purpose |
|---|---|
sort -n file | Numeric sort |
sort -nr file | Reverse numeric sort |
sort -u file | Unique sort |
sort -k2 file | Sort by second field |
sort -t',' -k3n file | Sort CSV by third column numerically |
sort -V versions.txt | Version sort |
sort -h sizes.txt | Human-readable sizes |
sort -M dates.txt | Month sort |
sort -R file | Randomize |
sort -c file | Check if sorted |
Conclusion
The sort command is an essential tool for data processing and organization:
Key Points Summary
- Basic Sorting:
- Alphabetical sort (default)
- Numeric sort with
-n - Reverse order with
-r - Unique lines with
-u
- Field-Based Sorting:
- Specify fields with
-k - Set field separator with
-t - Multiple sort keys
- Field-specific options (
n,r,b,f)
- Special Sort Types:
- Version numbers (
-V) - Human-readable sizes (
-h) - Months (
-M) - Random (
-R)
- Performance:
- Use
-Tfor temporary directory - Set memory limit with
-S - Use
LC_ALL=Cfor speed - Parallel sorting with
--parallel
Best Practices
- Choose the right sort type for your data
- Use field specifications for structured data
- Consider performance for large files
- Combine with other commands for complex processing
- Use
-uinstead ofsort | uniqfor efficiency - Check sort results with
-cbefore merging - Document complex sort commands in scripts
The sort command's versatility makes it indispensable for data processing, log analysis, and system administration. Mastering sort will significantly enhance your command-line data manipulation capabilities.