Bash sort Command – Complete Guide to Sorting Lines of Text Files

Introduction to sort

The sort command is a powerful utility in Unix/Linux systems that sorts lines of text files. It can organize data alphabetically, numerically, by fields, and in various orders. Understanding sort is essential for data processing, log analysis, and general text manipulation.

Basic Syntax

sort [options] [file...]

If no file is specified, sort reads from standard input.

1. Basic Usage

Simple Sorting

# Sort alphabetically (default)
sort file.txt
# Sort in reverse order
sort -r file.txt
# Sort multiple files together
sort file1.txt file2.txt
# Save sorted output to new file
sort file.txt > sorted.txt
# Sort and display with line numbers
sort file.txt | nl

Examples

$ cat fruits.txt
banana
apple
cherry
date
$ sort fruits.txt
apple
banana
cherry
date
$ sort -r fruits.txt
date
cherry
banana
apple

2. Common Options

Output Control

# -o: Write output to file (safe for in-place sorting)
sort -o sorted.txt file.txt
# -u: Unique lines (remove duplicates)
sort -u file.txt
# -c: Check if file is already sorted
sort -c file.txt && echo "File is sorted"
# -C: Check silently (exit code only)
if sort -C file.txt; then
echo "File is sorted"
fi

Sorting Order

# -f: Ignore case (fold lower case to upper case)
sort -f file.txt
# -d: Dictionary order (ignore non-alphanumeric)
sort -d file.txt
# -i: Ignore non-printable characters
sort -i file.txt
# -M: Month sort (Jan, Feb, etc.)
sort -M dates.txt
# -h: Human numeric sort (2K, 1G, etc.)
sort -h sizes.txt
# -V: Version sort (natural version numbers)
sort -V versions.txt

Examples with Options

$ cat mixed.txt
Apple
banana
CHERRY
date
$ sort -f mixed.txt  # Case-insensitive
Apple
banana
CHERRY
date
$ cat versions.txt
v1.0
v1.10
v1.2
v2.0
$ sort -V versions.txt  # Version sort
v1.0
v1.2
v1.10
v2.0

3. Numeric Sorting

Numeric Options

# -n: Numeric sort
sort -n numbers.txt
# -g: General numeric sort (handles scientific notation)
sort -g scientific.txt
# Sort by numeric value in reverse
sort -nr numbers.txt
# Sort with specific numeric format
sort -n --numeric-sort numbers.txt

Examples

$ cat numbers.txt
10
2
33
5
111
$ sort -n numbers.txt
2
5
10
33
111
$ sort -nr numbers.txt  # Reverse numeric
111
33
10
5
2
$ cat mixed_numbers.txt
10 apples
2 bananas
33 cherries
5 dates
$ sort -n mixed_numbers.txt  # Sorts by first field numerically
2 bananas
5 dates
10 apples
33 cherries

4. Field-Based Sorting

Field Specification

# -k: Sort by key field (1-based)
sort -k2 file.txt  # Sort by field 2
# -t: Specify field separator (default: whitespace)
sort -t',' -k2 data.csv
# Multiple keys
sort -k1,1 -k2n file.txt  # Sort by field1, then numerically by field2
# Field ranges
sort -k2,3 file.txt  # Sort by fields 2 through 3

Field Options

# n: numeric sort for specific field
sort -k2n file.txt
# r: reverse for specific field
sort -k1r -k2n file.txt
# b: ignore leading blanks
sort -k2b file.txt
# f: fold case for field
sort -k2f file.txt

Complex Field Examples

$ cat data.txt
John,30,NY
Alice,25,CA
Bob,30,TX
Carol,25,NY
# Sort by age (field 2) numerically
sort -t',' -k2n data.txt
Alice,25,CA
Carol,25,NY
John,30,NY
Bob,30,TX
# Sort by age then name
sort -t',' -k2n -k1 data.txt
Alice,25,CA
Carol,25,NY
Bob,30,TX
John,30,NY
# Sort by state then age
sort -t',' -k3,3 -k2n data.txt
Alice,25,CA
Carol,25,NY
John,30,NY
Bob,30,TX
$ cat /etc/passwd | sort -t':' -k3n | head -3  # Sort users by UID
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin

5. Advanced Sorting Techniques

Stable Sorting

# -s: Stable sort (preserves original order of equal lines)
sort -s file.txt
# Compare with unstable sort
sort -k2 file.txt        # May reorder equal fields
sort -s -k2 file.txt     # Preserves order of equal fields

Merge Sorted Files

# -m: Merge already sorted files
sort -m sorted1.txt sorted2.txt > merged.txt
# Check if merge is valid
sort -c sorted1.txt && sort -c sorted2.txt && sort -m sorted1.txt sorted2.txt

Random Sort

# -R: Random sort (shuffle)
sort -R file.txt
# With seed for reproducibility
sort -R --random-source=/dev/urandom file.txt

Examples

$ cat data.txt
apple,10
banana,5
apple,20
cherry,15
banana,10
$ sort -t',' -k1 -s data.txt  # Stable sort by fruit
apple,10
apple,20
banana,5
banana,10
cherry,15
$ sort -R data.txt | head -3  # Random order (different each time)
banana,5
apple,20
cherry,15

6. Practical Examples

Log File Analysis

# Sort log entries by timestamp
sort -k1,2 access.log
# Find most frequent IP addresses
cut -d' ' -f1 access.log | sort | uniq -c | sort -nr | head -10
# Sort errors by count
grep "ERROR" app.log | cut -d']' -f2 | sort | uniq -c | sort -nr
# Sort by date in custom format
sort -t' ' -k1.9,1.12n -k1.5,1.7M -k1.2,1.4n logfile.txt
# Sorts by: year, month, day in format "10 Mar 2024"

Data Processing

# Sort CSV by column
sort -t',' -k3n data.csv > sorted_data.csv
# Sort with header preservation
(head -1 data.csv && tail -n +2 data.csv | sort -k2) > sorted.csv
# Remove duplicates after sorting
sort -u file.txt > unique.txt
# Sort by file size
ls -l | sort -k5n
# Sort processes by memory usage
ps aux | sort -k4nr | head -10

Text Processing

# Sort words in a sentence
echo "the quick brown fox" | tr ' ' '\n' | sort | tr '\n' ' '
# Sort paragraphs (using blank lines as record separators)
tr '\n' '\0' < file.txt | sort -z | tr '\0' '\n'
# Sort by line length
awk '{print length, $0}' file.txt | sort -n | cut -d' ' -f2-
# Sort and number lines
sort file.txt | nl -w3 -s'. '

7. Script Examples

Log Analyzer Script

#!/bin/bash
# analyze_logs.sh - Comprehensive log analysis with sort
LOG_FILE="$1"
if [[ ! -f "$LOG_FILE" ]]; then
echo "Usage: $0 <logfile>"
exit 1
fi
echo "=== Log Analysis Report ==="
echo "File: $LOG_FILE"
echo "Date: $(date)"
echo
# Top IP addresses
echo "Top IP Addresses:"
grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" "$LOG_FILE" |
sort | uniq -c | sort -nr | head -10 |
awk '{printf "  %-15s %d\n", $2, $1}'
echo
# Top requested URLs
echo "Top Requested URLs:"
grep -oE "\"(GET|POST|PUT|DELETE) [^\"]+\"" "$LOG_FILE" |
cut -d' ' -f2 | sort | uniq -c | sort -nr | head -10 |
awk '{printf "  %-40s %d\n", $2, $1}'
echo
# Response codes
echo "Response Code Distribution:"
grep -oE "\" [0-9]{3} " "$LOG_FILE" |
awk '{print $2}' | sort | uniq -c | sort -nr |
awk '{printf "  %s: %d\n", $2, $1}'
echo
# Busy hours
echo "Busy Hours:"
grep -oE ":[0-9]{2}:[0-9]{2}:[0-9]{2}" "$LOG_FILE" |
cut -d':' -f1 | sort | uniq -c | sort -nr |
awk '{printf "  %02d:00 - %d requests\n", $2, $1}'

CSV Processor Script

#!/bin/bash
# process_csv.sh - Sort and analyze CSV files
CSV_FILE="$1"
SORT_COL="${2:-1}"
DELIMITER="${3:-,}"
if [[ ! -f "$CSV_FILE" ]]; then
echo "Usage: $0 <csv_file> [sort_column] [delimiter]"
exit 1
fi
# Get header
HEADER=$(head -1 "$CSV_FILE")
# Sort data (excluding header)
echo "=== Sorted CSV Data ==="
echo "$HEADER"
tail -n +2 "$CSV_FILE" | sort -t"$DELIMITER" -k"$SORT_COL"n
echo
echo "=== Column Statistics ==="
echo
# Calculate statistics for numeric columns
COLS=$(echo "$HEADER" | awk -F"$DELIMITER" '{print NF}')
for ((i=1; i<=COLS; i++)); do
echo "Column $i Statistics:"
tail -n +2 "$CSV_FILE" | cut -d"$DELIMITER" -f"$i" | {
# Check if column is numeric
if tail -n +2 "$CSV_FILE" | cut -d"$DELIMITER" -f"$i" | grep -qE '^[0-9]+(\.[0-9]+)?$'; then
echo "  Type: Numeric"
echo "  Min: $(sort -n | head -1)"
echo "  Max: $(sort -n | tail -1)"
echo "  Unique: $(sort -u | wc -l)"
else
echo "  Type: Text"
echo "  Unique values: $(sort -u | wc -l)"
echo "  Top values:"
sort | uniq -c | sort -nr | head -3 |
while read count value; do
echo "    $value: $count"
done
fi
}
echo
done

Duplicate Finder Script

#!/bin/bash
# find_duplicates.sh - Find and sort duplicate lines
find_duplicates() {
local file="$1"
local min_count="${2:-2}"
if [[ ! -f "$file" ]]; then
echo "File not found: $file"
return 1
fi
echo "Analyzing duplicates in: $file"
echo "Minimum duplicate count: $min_count"
echo
# Find duplicates with counts
sort "$file" | uniq -c | sort -nr | while read count line; do
if [[ $count -ge $min_count ]]; then
echo "$count: $line"
fi
done
}
# Find duplicate lines across multiple files
find_cross_file_duplicates() {
local pattern="$1"
cat $pattern | sort | uniq -c | sort -nr | 
awk '$1 > 1 {print $0}'
}
case "$1" in
single)
find_duplicates "$2" "${3:-2}"
;;
multi)
find_cross_file_duplicates "$2"
;;
*)
echo "Usage: $0 {single|multi} <file/pattern> [min_count]"
;;
esac

8. Sorting with Pipes and Redirections

Common Pipelines

# Sort and count unique lines
cat file.txt | sort | uniq -c | sort -nr
# Sort and page through less
sort largefile.txt | less
# Sort and display with head/tail
sort -nr data.txt | head -20
# Sort multiple files and remove duplicates
sort -u file1.txt file2.txt file3.txt
# Sort and split output
sort hugefile.txt | split -l 10000 - sorted_chunk_

Advanced Pipelines

# Find most common words in a file
tr -cs '[:alpha:]' '\n' < file.txt | 
sort | uniq -c | sort -nr | head -20
# Sort files by modification time
ls -l | sort -k6,7M -k7n
# Sort process tree
ps auxf | sort -k3nr  # Sort by CPU usage
# Sort and join files
sort file1.txt > sorted1.txt
sort file2.txt > sorted2.txt
join sorted1.txt sorted2.txt

9. Performance Considerations

Large File Handling

# Use temporary directory for large sorts
sort -T /tmp/large -o sorted.txt hugefile.txt
# Increase buffer size
sort -S 50% hugefile.txt  # Use 50% of memory
# Parallel sorting (with GNU sort)
sort --parallel=4 hugefile.txt
# External sort for very large files
sort -S 1G -T /fast/disk/tmp hugefile.txt

Optimization Tips

# Use LC_ALL for faster sorting
LC_ALL=C sort file.txt
# Avoid unnecessary pipes
# Bad:
cat file.txt | sort
# Good:
sort file.txt
# Use -u instead of sort | uniq
# Bad:
sort file.txt | uniq
# Good:
sort -u file.txt
# Sort only necessary fields
cut -f1,2 file.txt | sort > sorted_fields.txt

10. Sorting Different Data Types

IP Addresses

# Sort IP addresses
sort -t. -k1,1n -k2,2n -k3,3n -k4,4n ips.txt
# Using version sort for IPs
sort -V ips.txt
# Sort IPs with counts
cut -d' ' -f1 access.log | sort | uniq -c | sort -nr

Dates and Times

# Sort dates (YYYY-MM-DD)
sort -t'-' -k1n -k2n -k3n dates.txt
# Sort timestamps
sort -k1,2 logfile.txt
# Sort by month
sort -M -k2 data.txt
# Sort by day of week
awk '{print $1}' dates.txt | 
while read date; do
date -d "$date" +"%u %s $date"
done | sort -n | cut -d' ' -f3-

Human-Readable Sizes

# Sort file sizes
ls -lh | sort -h -k5
# Sort by size with units
du -sh * | sort -h
# Sort free memory
free -h | sort -h -k3

11. Special Sorting Cases

Version Numbers

# Sort version numbers
sort -V versions.txt
# Complex version sorting
echo -e "1.0.0\n1.0.1\n1.1.0\n2.0.0-beta\n2.0.0" | sort -V
# Sort package versions
dpkg -l | grep ^ii | awk '{print $2"="$3}' | sort -V

Mixed Content

# Sort mixed alphanumeric
sort -V mixed.txt
# Sort with numeric prefix
sort -n numbers.txt 2>/dev/null || sort -f mixed.txt
# Sort ignoring leading symbols
sort -b file.txt  # Ignore leading blanks

Custom Sort Keys

# Sort by last field
awk '{print $NF, $0}' file.txt | sort | cut -d' ' -f2-
# Sort by word count
while read line; do
echo "$(echo "$line" | wc -w):$line"
done < file.txt | sort -n | cut -d':' -f2-

12. Error Handling

Common Errors

# No such file
sort nonexistent.txt
# Output: sort: cannot read: nonexistent.txt: No such file or directory
# Permission denied
sort /etc/shadow
# Output: sort: cannot read: /etc/shadow: Permission denied
# Invalid option
sort -z file.txt  # if -z not supported

Error Handling in Scripts

# Check if sort succeeded
if sort -o output.txt input.txt 2>/dev/null; then
echo "Sorting successful"
else
echo "Sorting failed"
fi
# Validate input before sorting
if [[ -r "$file" ]]; then
sort "$file"
else
echo "Cannot read $file" >&2
exit 1
fi
# Check if file is empty
if [[ -s "$file" ]]; then
sort "$file"
else
echo "File is empty" >&2
fi

13. Comparison with Related Commands

CommandPurposeExample
sortSort linessort file.txt
uniqRemove duplicatesuniq file.txt
shufRandom permutationshuf file.txt
commCompare sorted filescomm file1.txt file2.txt
joinJoin on common fieldjoin file1.txt file2.txt
tsortTopological sorttsort deps.txt

Combined Usage

# Find common lines in sorted files
sort file1.txt > sorted1.txt
sort file2.txt > sorted2.txt
comm -12 sorted1.txt sorted2.txt
# Find lines unique to file1
comm -23 sorted1.txt sorted2.txt
# Join on first field
sort -k1 file1.txt > s1.txt
sort -k1 file2.txt > s2.txt
join -t',' -1 1 -2 1 s1.txt s2.txt

14. Environment Variables

LC_COLLATE

Controls sort order (alphabetical rules):

# Default locale sort
export LC_COLLATE=en_US.UTF-8
sort file.txt
# C locale (ASCII order)
export LC_COLLATE=C
sort file.txt
# Spanish locale (ll treated as separate letter)
export LC_COLLATE=es_ES.UTF-8
sort spanish.txt

TMPDIR

Temporary directory for large sorts:

export TMPDIR=/fast/disk/tmp
sort -S 1G hugefile.txt

15. GNU Extensions

GNU sort Additional Features

# --debug: Show sorting decisions
sort --debug file.txt
# --parallel: Parallel sorting
sort --parallel=4 hugefile.txt
# --compress-program: Compress temporary files
sort --compress-program=gzip hugefile.txt
# --batch-size: Number of merges
sort --batch-size=16 hugefile.txt
# --random-source: Random source for -R
sort -R --random-source=/dev/urandom file.txt

16. Practical Applications

Database-like Operations

# SELECT DISTINCT column FROM table ORDER BY column
cut -d',' -f2 data.csv | sort -u
# SELECT column, COUNT(*) GROUP BY column ORDER BY count DESC
cut -d',' -f1 data.csv | sort | uniq -c | sort -nr
# ORDER BY multiple columns
sort -t',' -k2,2 -k3n data.csv
# LIMIT with OFFSET
sort -k3n data.csv | tail -n +10 | head -5

Data Cleaning

# Remove duplicates and sort
sort -u messy.txt > clean.txt
# Sort and remove blank lines
grep -v '^$' file.txt | sort > cleaned.txt
# Normalize whitespace and sort
tr -s ' ' < file.txt | sort
# Sort and number lines
sort file.txt | nl -w3 -s'. '

System Administration

# Sort users by UID
cut -d: -f1,3 /etc/passwd | sort -t: -k2n
# Sort running processes by memory
ps aux | sort -k4nr | head -10
# Sort mounted filesystems by size
df -h | sort -k2hr
# Sort network connections by state
netstat -tuna | sort -k6

17. Benchmarking

Performance Testing

# Time sort operations
time sort largefile.txt > /dev/null
# Compare different sort strategies
time sort -S 1G largefile.txt > /dev/null
time sort -T /tmp -S 1G largefile.txt > /dev/null
# Test with different locales
time LC_ALL=C sort largefile.txt > /dev/null
time LC_ALL=en_US.UTF-8 sort largefile.txt > /dev/null

Memory Usage

# Monitor memory during sort
/usr/bin/time -v sort largefile.txt > /dev/null
# Check temporary file usage
lsof -c sort

18. Quick Reference Card

Most Common Options

OptionDescription
-nNumeric sort
-rReverse order
-uUnique lines
-kSort by key field
-tField separator
-oOutput file
-cCheck if sorted
-fIgnore case
-MMonth sort
-hHuman numeric
-VVersion sort
-sStable sort
-RRandom sort
-mMerge sorted files

Field Specifications

SyntaxMeaning
-k2Sort by field 2 to end of line
-k2,2Sort by field 2 only
-k2,3Sort by fields 2 through 3
-k2nSort field 2 numerically
-k2nrSort field 2 numerically in reverse
-k2,2 -k3,3nSort by field 2, then numerically by field 3

Common Combinations

CommandPurpose
sort -n fileNumeric sort
sort -nr fileReverse numeric sort
sort -u fileUnique sort
sort -k2 fileSort by second field
sort -t',' -k3n fileSort CSV by third column numerically
sort -V versions.txtVersion sort
sort -h sizes.txtHuman-readable sizes
sort -M dates.txtMonth sort
sort -R fileRandomize
sort -c fileCheck if sorted

Conclusion

The sort command is an essential tool for data processing and organization:

Key Points Summary

  1. Basic Sorting:
  • Alphabetical sort (default)
  • Numeric sort with -n
  • Reverse order with -r
  • Unique lines with -u
  1. Field-Based Sorting:
  • Specify fields with -k
  • Set field separator with -t
  • Multiple sort keys
  • Field-specific options (n, r, b, f)
  1. Special Sort Types:
  • Version numbers (-V)
  • Human-readable sizes (-h)
  • Months (-M)
  • Random (-R)
  1. Performance:
  • Use -T for temporary directory
  • Set memory limit with -S
  • Use LC_ALL=C for speed
  • Parallel sorting with --parallel

Best Practices

  1. Choose the right sort type for your data
  2. Use field specifications for structured data
  3. Consider performance for large files
  4. Combine with other commands for complex processing
  5. Use -u instead of sort | uniq for efficiency
  6. Check sort results with -c before merging
  7. Document complex sort commands in scripts

The sort command's versatility makes it indispensable for data processing, log analysis, and system administration. Mastering sort will significantly enhance your command-line data manipulation capabilities.

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper