Introduction to cut Command
The cut command is a powerful text processing utility in Unix/Linux that extracts sections from each line of input. It's commonly used for parsing structured data like CSV files, log files, and command output. The command works by selecting columns based on delimiters, character positions, or byte positions.
Key Concepts
- Field-based cutting: Extract based on delimiters (like commas, tabs)
- Character-based cutting: Extract specific character positions
- Byte-based cutting: Extract specific byte positions
- Complement: Select everything EXCEPT the specified sections
- Output delimiter: Specify how to join extracted sections
1. Basic cut Syntax
Command Structure
# Basic syntax cut OPTION [FILE] # Common options cut -f FIELD_LIST # Select fields (delimited text) cut -c CHARACTER_LIST # Select characters cut -b BYTE_LIST # Select bytes cut -d DELIMITER # Specify delimiter (default: TAB) cut -s # Only print lines containing delimiter cut --complement # Select complement of specified fields cut --output-delimiter=STRING # Specify output delimiter
Simple Examples
# Cut characters (positions 1-5 from each line) echo "Hello World" | cut -c 1-5 # Output: Hello # Cut fields (first field using TAB delimiter) echo -e "John\t30\tNYC" | cut -f 1 # Output: John # Cut with custom delimiter echo "John,30,NYC" | cut -d ',' -f 2 # Output: 30
2. Field-Based Cutting (-f)
Basic Field Selection
# Sample data file (data.txt) cat data.txt # John,30,NYC,Engineer # Alice,25,LA,Designer # Bob,35,Chicago,Manager # Select single field cut -d ',' -f 1 data.txt # John # Alice # Bob # Select multiple fields cut -d ',' -f 1,3 data.txt # John,NYC # Alice,LA # Bob,Chicago # Select range of fields cut -d ',' -f 2-4 data.txt # 30,NYC,Engineer # 25,LA,Designer # 35,Chicago,Manager # Select from field to end cut -d ',' -f 2- data.txt # 30,NYC,Engineer # 25,LA,Designer # 35,Chicago,Manager # Select up to a field cut -d ',' -f -3 data.txt # John,30,NYC # Alice,25,LA # Bob,35,Chicago
Complex Field Selection
# Combine ranges and individual fields cut -d ',' -f 1,3-4 data.txt # John,NYC,Engineer # Alice,LA,Designer # Bob,Chicago,Manager # Select non-consecutive fields cut -d ',' -f 1,4 data.txt # John,Engineer # Alice,Designer # Bob,Manager # Using with process substitution ps aux | cut -d ' ' -f 1,2 | head -5
Handling TAB-Delimited Files
# TAB is the default delimiter cat tab_data.txt # John 30 NYC # Alice 25 LA # Bob 35 Chicago # No need for -d with TAB cut -f 1,2 tab_data.txt # John 30 # Alice 25 # Bob 35 # Change output delimiter for TAB cut -f 1,2 --output-delimiter=',' tab_data.txt # John,30 # Alice,25 # Bob,35
3. Character-Based Cutting (-c)
Character Position Selection
# Sample file (text.txt) cat text.txt # Hello World # Bash Scripting # Cut Command # Select specific characters cut -c 1 text.txt # H # B # C # Select range of characters cut -c 1-5 text.txt # Hello # Bash # Cut C # Select multiple ranges cut -c 1-5,7-10 text.txt # HelloWorl # BashScri # Cut Comm # Select from position to end cut -c 3- text.txt # llo World # sh Scripting # t Command # Select up to position cut -c -5 text.txt # Hello # Bash # Cut C
Fixed-Width Data Processing
# Fixed-width file (fixed.txt) cat fixed.txt # John 30 NYC # Alice 25 LA # Bob 35 Chicago # Extract based on character positions cut -c 1-10 fixed.txt # Name field # John # Alice # Bob cut -c 11-15 fixed.txt # Age field # 30 # 25 # 35 # Combine with trim cut -c 1-10 fixed.txt | sed 's/ *$//' # John # Alice # Bob
4. Byte-Based Cutting (-b)
Byte Position Selection
# Note: -b is similar to -c for ASCII, different for multibyte # ASCII file (all single-byte) echo "Hello" | cut -b 1-3 # Hel # With multibyte characters (UTF-8) echo "café" | cut -b 1-4 # café # -c handles multibyte correctly echo "café" | cut -c 1-4 # café # Difference becomes apparent with some characters echo "🚀 Rocket" | cut -b 1-4 # (may produce partial character) echo "🚀 Rocket" | cut -c 1-4 # 🚀 R
5. Working with Delimiters (-d)
Custom Delimiters
# CSV files echo "John,Doe,30,Engineer" | cut -d ',' -f 2 # Doe # Colon-delimited (like /etc/passwd) cut -d ':' -f 1,3 /etc/passwd | head -3 # root:0 # daemon:1 # bin:2 # Space-delimited (but careful with multiple spaces) echo "John Doe 30" | cut -d ' ' -f 2 # (empty because of multiple spaces) # Better for spaces: use tr to squeeze spaces echo "John Doe 30" | tr -s ' ' | cut -d ' ' -f 2 # Doe # Pipe-delimited echo "John|Doe|30|Engineer" | cut -d '|' -f 1-3 # John|Doe|30
Multiple Delimiters
# cut doesn't support multiple delimiters directly # Use tr to convert delimiters first # File with mixed delimiters cat mixed.txt # John,30;NYC|Engineer # Alice;25,LA|Designer # Convert all to same delimiter tr ',;|' '\t' < mixed.txt | cut -f 1,3 # John NYC # Alice LA # Using multiple delimiters with tr echo "John:30,NYC;Engineer" | tr ':,' '\t' | cut -f 1,3 # John NYC
6. Complement Selection (--complement)
Inverse Selection
# Select everything EXCEPT specified fields echo "John,30,NYC,Engineer" | cut -d ',' --complement -f 2 # John,NYC,Engineer # Remove first field echo "John,30,NYC" | cut -d ',' --complement -f 1 # 30,NYC # Remove multiple fields echo "John,30,NYC,Engineer" | cut -d ',' --complement -f 1,4 # 30,NYC # Remove range echo "John,30,NYC,Engineer" | cut -d ',' --complement -f 2-3 # John,Engineer # Character-based complement echo "Hello World" | cut --complement -c 1-5 # World
7. Output Delimiter
Changing Output Format
# Change output delimiter echo "John,30,NYC" | cut -d ',' -f 1,3 --output-delimiter=':' # John:NYC # Multiple fields with custom delimiter cut -d ':' -f 1,3 /etc/passwd --output-delimiter=' -> ' | head -3 # root -> 0 # daemon -> 1 # bin -> 2 # Create CSV from TAB-delimited cut -f 1,2 tab_data.txt --output-delimiter=',' # John,30 # Alice,25 # Bob,35 # Using with different delimiters echo "John 30 NYC" | tr ' ' '\t' | cut -f 1,3 --output-delimiter='|' # John|NYC
8. Practical Examples
System Administration
# Extract usernames from /etc/passwd
cut -d ':' -f 1 /etc/passwd | sort
# Get UIDs of regular users (UID >= 1000)
cut -d ':' -f 1,3 /etc/passwd | grep ':[0-9]\{4,\}' | cut -d ':' -f 1
# Extract IP addresses from ifconfig
ifconfig | grep 'inet ' | cut -d ' ' -f 10 | grep -v '127.0.0.1'
# Get list of logged-in users
who | cut -d ' ' -f 1 | sort -u
# Extract process names
ps aux | tail -n +2 | cut -d ' ' -f 11- | head -5
# Get disk usage by mount point
df -h | tail -n +2 | cut -c 1-50 | head -5
Log File Analysis
# Extract IP addresses from Apache log cat access.log | cut -d ' ' -f 1 | sort | uniq -c | sort -nr # Extract timestamps from log cat app.log | cut -d ' ' -f 1-2 | head -5 # Get HTTP status codes from access log cat access.log | cut -d '"' -f 3 | cut -d ' ' -f 2 | sort | uniq -c # Extract specific fields from syslog grep "sshd" /var/log/auth.log | cut -d ' ' -f 1-3,5- | head -5 # Parse CSV log cat logs.csv | cut -d ',' -f 1,3,5 --output-delimiter=' | '
Data Processing
# Extract columns from CSV cat data.csv | cut -d ',' -f 2,4 | sort | uniq -c # Get first and last fields echo "a:b:c:d:e" | cut -d ':' -f 1,5 # a:e # Extract phone numbers from contact list grep "Phone:" contacts.txt | cut -d ':' -f 2 | sed 's/^ //' # Parse key-value pairs echo "key1=value1&key2=value2&key3=value3" | tr '&' '\n' | cut -d '=' -f 2 # Extract domain from email list cat emails.txt | cut -d '@' -f 2 | sort -u
File Information
# Get file extensions ls -1 | grep '\\.' | cut -d '.' -f 2- | sort -u # Extract file sizes from ls -l ls -l | tail -n +2 | cut -d ' ' -f 5 | numfmt --to=iec # Get file permissions ls -l | tail -n +2 | cut -d ' ' -f 1 # Extract modification dates ls -l | tail -n +2 | cut -c 40-60
9. Combining cut with Other Commands
With grep
# Search then cut grep "ERROR" app.log | cut -d ' ' -f 1-4 # Cut then search cut -d ',' -f 2 data.csv | grep "pattern" # Multiple filters grep -v "^#" config.conf | cut -d '=' -f 1 | grep -v "^$"
With sort and uniq
# Count occurrences cut -d ',' -f 3 data.csv | sort | uniq -c | sort -nr # Unique values cut -d ':' -f 1 /etc/passwd | sort -u # Top N values cut -d ' ' -f 1 access.log | sort | uniq -c | sort -nr | head -10
With awk and sed
# Pre-process with sed before cut
sed 's/ */ /g' file.txt | cut -d ' ' -f 2
# Post-process with awk
cut -d ',' -f 2,4 data.csv | awk -F ',' '{print $2 ":" $1}'
# Complex pipeline
cat data.txt |
grep -v "^#" |
cut -d '|' -f 2,5 |
sed 's/|/,/g' |
sort -u
With xargs
# Extract and use as arguments
cut -d ':' -f 1 /etc/passwd | head -5 | xargs echo "Users:"
# Delete files listed in a file
cut -d ',' -f 1 files.csv | xargs rm -i
# Process each line
cut -d ',' -f 2 data.csv | xargs -I {} echo "Processing: {}"
10. Advanced Techniques
Multi-Character Delimiters
# cut doesn't support multi-char delimiters directly
# Use sed or awk instead
# File with "||" delimiter
echo "John||30||NYC" | sed 's/||/\t/g' | cut -f 2
# 30
# Using awk for multi-char delimiters
echo "John||30||NYC" | awk -F '\\|\\|' '{print $2}'
# 30
# Complex delimiters with perl
echo "John::30::NYC" | perl -F'::' -lane 'print $F[1]'
# 30
Handling Quoted Fields
# CSV with quoted fields (cut has limitations)
echo '"John, Doe",30,"New York"' |
awk -F ',' '{
for(i=1;i<=NF;i++) {
if($i ~ /^"/) {
while($i !~ /"$/) {
i++
$1 = $1 "," $i
}
}
print $i
}
}'
# Better: use csvkit or proper CSV parser
# csvcut -c 2 file.csv
Variable Width Fields
# Fixed width with cut
cut -c 1-20,30-40 file.txt
# Variable width with awk
awk '{print substr($0,1,20) substr($0,30,10)}' file.txt
# Combining with expansion
cut -c 1-$(tput cols) /var/log/syslog | head -5
Dynamic Field Selection
# Use variables for field numbers
field=3
cut -d ',' -f $field data.csv
# Calculate field positions
start=5
end=10
cut -c ${start}-${end} file.txt
# Programmatic field selection
for i in 1 3 5; do
cut -d ',' -f $i data.csv
done | paste -d ',' - - -
# Select based on content
grep -n "pattern" file.txt | cut -d ':' -f 1 | xargs -I {} sed -n '{}p' file.txt
11. Error Handling and Edge Cases
Missing Fields
# Lines with missing fields cat inconsistent.txt # John,30,NYC # Alice,25 # Bob # -s option suppresses lines without delimiter cut -d ',' -f 2 -s inconsistent.txt # 30 # 25 # Without -s, prints entire line if delimiter missing cut -d ',' -f 2 inconsistent.txt # 30 # 25 # Bob (entire line printed)
Empty Fields
# CSV with empty fields cat empties.csv # John,,30,NYC # Alice,25,,LA # Empty fields are preserved cut -d ',' -f 2 empties.csv # (empty line) # 25 # Count empty fields cut -d ',' -f 2 empties.csv | grep -c "^$" # 1
Leading/Trailing Delimiters
# Line starting with delimiter echo ",30,NYC" | cut -d ',' -f 2 # 30 # Line ending with delimiter echo "John,30," | cut -d ',' -f 3 # (empty) # Multiple consecutive delimiters echo "John,,30,NYC" | cut -d ',' -f 2 # (empty)
12. Performance Considerations
Large File Processing
# For large files, cut is very efficient
time cut -d ',' -f 2 hugefile.csv > output.txt
# Compare with awk
time awk -F ',' '{print $2}' hugefile.csv > output.txt
# cut is generally faster than awk for simple field extraction
# Use LC_ALL=C for ASCII files (faster)
LC_ALL=C cut -d ',' -f 2 hugefile.csv
# Process in chunks for very large files
split -l 1000000 hugefile.csv chunk_
for f in chunk_*; do
cut -d ',' -f 2 "$f" >> output.txt &
done
wait
rm chunk_*
Memory Usage
# cut streams data, minimal memory usage # Monitor memory /usr/bin/time -v cut -d ',' -f 2 hugefile.csv > /dev/null # Piping large data tar -cf - bigdir/ | cut -c 1-100 | head -5 # cut processes stream without loading entire file
13. Script Examples
CSV Processor
#!/bin/bash
# Process CSV file with headers
process_csv() {
local file="$1"
local field="$2"
# Get header
header=$(head -1 "$file" | cut -d ',' -f "$field")
# Get data
echo "Processing field: $header"
tail -n +2 "$file" | cut -d ',' -f "$field" | sort | uniq -c
}
# Extract specific columns with validation
extract_columns() {
local file="$1"
local columns="$2"
local delimiter="${3:-,}"
# Validate file exists
if [ ! -f "$file" ]; then
echo "Error: File not found" >&2
return 1
fi
# Get column count from header
local num_cols=$(head -1 "$file" | tr -cd "$delimiter" | wc -c)
num_cols=$((num_cols + 1))
# Validate columns
for col in $(echo "$columns" | tr ',' ' '); do
if [ "$col" -gt "$num_cols" ]; then
echo "Error: Column $col exceeds file columns ($num_cols)" >&2
return 1
fi
done
# Extract columns
cut -d "$delimiter" -f "$columns" "$file"
}
# Usage
# extract_columns data.csv 1,3,5
Log Analyzer
#!/bin/bash
# Apache log analyzer
analyze_apache_log() {
local logfile="$1"
echo "=== Apache Log Analysis ==="
# Top IPs
echo -e "\nTop IP addresses:"
cut -d ' ' -f 1 "$logfile" | sort | uniq -c | sort -nr | head -5
# Top pages
echo -e "\nTop requested pages:"
cut -d '"' -f 2 "$logfile" | cut -d ' ' -f 2 | sort | uniq -c | sort -nr | head -5
# HTTP status codes
echo -e "\nStatus codes:"
cut -d '"' -f 3 "$logfile" | cut -d ' ' -f 2 | sort | uniq -c | sort -nr
# Traffic by hour
echo -e "\nTraffic by hour:"
cut -d '[' -f 2 "$logfile" | cut -d ':' -f 2 | sort | uniq -c
}
# Usage
analyze_apache_log /var/log/apache2/access.log
Data Extractor
#!/bin/bash
# Flexible data extraction tool
extract_data() {
local file="$1"
local format="$2" # csv, tsv, fixed, custom
local spec="$3" # field numbers, ranges, etc.
case "$format" in
csv)
cut -d ',' -f "$spec" "$file"
;;
tsv)
cut -f "$spec" "$file"
;;
fixed)
# Convert spec like "1-10,20-30" to cut format
cut -c "$spec" "$file"
;;
custom)
local delim="$4"
cut -d "$delim" -f "$spec" "$file"
;;
*)
echo "Unknown format: $format" >&2
return 1
;;
esac
}
# Process multiple files
extract_from_files() {
local pattern="$1"
local fields="$2"
for file in $pattern; do
if [ -f "$file" ]; then
echo "=== $file ==="
cut -d ',' -f "$fields" "$file" | head -3
fi
done
}
# Usage
# extract_data data.csv csv 1,3,5
# extract_from_files "*.log" "1,2"
14. Common Use Cases Reference
Quick Reference Table
| Task | Command |
|---|---|
| Get first field from CSV | cut -d ',' -f 1 file.csv |
| Get username from /etc/passwd | cut -d ':' -f 1 /etc/passwd |
| Extract IP from log | cut -d ' ' -f 1 access.log |
| Get first 10 characters | cut -c 1-10 file.txt |
| Remove first field | cut --complement -d ',' -f 1 |
| Get last field | rev file.txt | cut -d ',' -f 1 | rev |
| Change delimiter | cut -d ',' -f 1,3 --output-delimiter=':' |
| Skip lines without delimiter | cut -d ',' -s -f 2 file.txt |
| Extract column range | cut -f 2-5 tab_data.txt |
| Multiple ranges | cut -c 1-10,20-30 file.txt |
Real-World Examples
# System information
# Get list of users with shells
cut -d ':' -f 1,7 /etc/passwd | grep -v "nologin\|false" | cut -d ':' -f 1
# Get running services
systemctl list-units --type=service --all | cut -c 1-50 | grep -v "^$"
# Monitor disk space by partition
df -h | tail -n +2 | cut -c 1-30,40-50
# Network connections
netstat -tulpn | tail -n +2 | cut -c 20-70 | head -10
# Process memory usage
ps aux | tail -n +2 | cut -c 1-20,40-60 | head -5
# Extract email domains
grep -E -o "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" emails.txt | cut -d '@' -f 2 | sort -u
15. Limitations and Alternatives
cut Limitations
# 1. No support for multi-character delimiters
echo "John||30||NYC" | cut -d '||' -f 2 # Doesn't work
# Solution: Use awk
echo "John||30||NYC" | awk -F '\\|\\|' '{print $2}'
# 2. No regex support
echo "John123Doe" | cut -c 1-4 # Can't extract based on pattern
# Solution: Use grep -o
echo "John123Doe" | grep -o '[A-Za-z]*'
# 3. Can't reorder fields
echo "a,b,c" | cut -d ',' -f 3,1 # Still prints a,c
# Solution: Use awk
echo "a,b,c" | awk -F ',' '{print $3 "," $1}'
# 4. No conditional extraction
echo "John,30,NYC" | cut -d ',' -f 2 # Can't filter based on value
# Solution: Use awk
echo "John,30,NYC" | awk -F ',' '$2 > 25 {print $0}'
Alternative Tools
# awk - Most flexible
awk -F ',' '{print $1, $3}' data.csv
# sed - Good for simple extractions
sed 's/^\([^,]*\),.*/\1/' data.csv
# perl - Full regex power
perl -F',' -lane 'print $F[0]' data.csv
# grep - Pattern-based extraction
grep -o '^[^,]*' data.csv
# tr + cut combination
tr ' ' '\t' < file.txt | cut -f 2
# column - Format text
column -t -s ',' data.csv
# csvkit - For serious CSV work
csvcut -c 1,3 data.csv
csvgrep -c 2 -m "pattern" data.csv
When to Use What
# Use cut when: # - Simple field extraction from delimited files # - Fixed-width character extraction # - Performance is critical # - Processing very large files # Use awk when: # - Need field reordering # - Complex conditions # - Calculations on fields # - Multi-character delimiters # Use sed when: # - Pattern-based substitution # - Line-by-line transformations # - Simple text manipulation # Use perl when: # - Complex regex operations # - Need maximum flexibility # - Processing non-standard formats
Conclusion
The cut command is a fundamental tool for text processing in Unix/Linux:
Key Takeaways
- Three modes: Field (
-f), character (-c), byte (-b) cutting - Default delimiter: TAB for fields
- Custom delimiters: Use
-dfor any single character - Complement:
--complementto exclude sections - Output formatting:
--output-delimiterto change separator - Skip lines:
-sto ignore lines without delimiter
Best Practices
| Scenario | Recommendation |
|---|---|
| CSV files | Use -d ',' with proper quoting handling |
| TSV files | Use default TAB delimiter |
| Fixed-width | Use -c with character ranges |
| Large files | cut is fastest option |
| Complex parsing | Consider awk instead |
| Multi-char delimiters | Use awk or perl |
Quick Reference Card
# Field extraction cut -d',' -f1 file.csv # First field cut -d',' -f1,3 file.csv # First and third cut -d',' -f2-5 file.csv # Fields 2 through 5 cut -d',' -f-3 file.csv # First three fields cut -d',' -f3- file.csv # From field 3 to end # Character extraction cut -c1-10 file.txt # First 10 chars cut -c5,10,15 file.txt # Specific positions cut -c1-10,20-30 file.txt # Multiple ranges # Output control cut -d',' -f1,3 --output-delimiter=':' file.csv cut -d',' --complement -f2 file.csv # Skip lines cut -d',' -s -f2 file.csv # Skip lines without comma
The cut command is essential for quick data extraction and processing. Master it for efficient command-line text manipulation!