Complete Guide to Rust Strings

Introduction to Rust Strings

Strings in Rust are a complex topic because the language provides two primary string types: String and &str. This design reflects Rust's commitment to memory safety, explicit ownership, and UTF-8 correctness. Understanding strings is crucial for writing robust Rust code that handles text properly.

Key Concepts

  • UTF-8 Encoded: All strings are valid UTF-8
  • Two Main Types: String (owned) and &str (borrowed)
  • Not Indexable: Can't index by byte position due to UTF-8
  • Heap Allocated: String allocates on the heap
  • Zero-cost: Abstractions compile to efficient code

1. String Types Overview

String vs &str

fn main() {
// &str - string slice (borrowed, immutable)
let s1: &str = "hello world";  // String literal
let s2: &str = "hello";        // Also a string literal
// String - owned, heap-allocated, mutable
let s3: String = String::from("hello world");
let s4: String = "hello world".to_string();
let s5 = "hello world".to_owned();
// Converting between types
let str_slice: &str = &s3;  // String -> &str (cheap)
let string: String = s2.to_string();  // &str -> String (allocates)
let string2: String = s2.to_owned();  // Also converts
// Memory layout
println!("Size of &str: {} bytes", std::mem::size_of::<&str>());  // 16 bytes (pointer + length)
println!("Size of String: {} bytes", std::mem::size_of::<String>()); // 24 bytes (pointer + length + capacity)
}

String Literals

fn main() {
// Basic string literals
let s1 = "Hello";  // Type: &'static str
let s2 = "World";
// Multiline strings
let multiline = "Line 1
Line 2
Line 3";
println!("{}", multiline);
// Escape sequences
let escapes = "Tab:\tNewline:\nQuote:\'Double:\"Backslash:\\";
println!("{}", escapes);
// Raw strings (no escaping)
let raw = r"Raw string: \n \t \\ are not escaped";
println!("{}", raw);
// Raw strings with quotes
let raw_quotes = r#"String with "quotes" inside"#;
println!("{}", raw_quotes);
// Byte strings (not UTF-8)
let bytes = b"hello";  // Type: &[u8; 5]
println!("Byte string: {:?}", bytes);
// Raw byte strings
let raw_bytes = br#"raw \n bytes"#;  // &[u8]
}

2. Creating Strings

Various Construction Methods

fn main() {
// Using String::new()
let mut s = String::new();
s.push_str("Hello");
println!("{}", s);
// Using String::from()
let s = String::from("hello");
println!("{}", s);
// Using to_string()
let s = "hello".to_string();
println!("{}", s);
// Using to_owned()
let s = "hello".to_owned();
println!("{}", s);
// Using format! macro
let name = "Alice";
let age = 30;
let s = format!("{} is {} years old", name, age);
println!("{}", s);
// Using collect from iterator
let chars = ['h', 'e', 'l', 'l', 'o'];
let s: String = chars.iter().collect();
println!("{}", s);
// With capacity pre-allocation
let mut s = String::with_capacity(100);
println!("Capacity: {}", s.capacity());
s.push_str("This won't reallocate immediately");
}

From Other Types

fn main() {
// From numbers
let from_i32 = 42.to_string();
let from_f64 = 3.14.to_string();
let from_bool = true.to_string();
println!("i32: {}, f64: {}, bool: {}", from_i32, from_f64, from_bool);
// From characters
let from_char = 'a'.to_string();
let from_chars: String = vec!['a', 'b', 'c'].into_iter().collect();
// From slices
let from_slice = &[1, 2, 3].iter().map(|x| x.to_string()).collect::<Vec<_>>().join(", ");
// From arrays
let arr = [1, 2, 3];
let from_arr = format!("{:?}", arr);
// From custom types (implement Display)
struct Point {
x: i32,
y: i32,
}
impl std::fmt::Display for Point {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(f, "({}, {})", self.x, self.y)
}
}
let point = Point { x: 10, y: 20 };
let from_point = point.to_string();
println!("Point: {}", from_point);
}

3. String Manipulation

Appending and Inserting

fn main() {
// push_str - append string slice
let mut s = String::from("foo");
s.push_str("bar");
println!("{}", s);  // foobar
// push - append single character
let mut s = String::from("Hello");
s.push(' ');
s.push('W');
s.push('o');
s.push('r');
s.push('l');
s.push('d');
s.push('!');
println!("{}", s);  // Hello World!
// insert - insert character at index
let mut s = String::from("Hello World");
s.insert(5, ',');
println!("{}", s);  // Hello, World
// insert_str - insert string slice at index
let mut s = String::from("Hello World");
s.insert_str(6, "Beautiful ");
println!("{}", s);  // Hello Beautiful World
}

Concatenation

fn main() {
// Using + operator (note: takes ownership of first operand)
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2;  // s1 is moved here, can't use after
println!("{}", s3);
// println!("{}", s1); // Error: s1 moved
// Using format! macro (doesn't take ownership)
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = format!("{}{}", s1, s2);
println!("{}", s3);
println!("s1: {}, s2: {}", s1, s2);  // Both still valid
// Concatenating multiple strings
let parts = vec!["Hello", ", ", "world", "!"];
let result = parts.concat();
println!("{}", result);
// Join with separator
let words = vec!["Hello", "world", "from", "Rust"];
let sentence = words.join(" ");
println!("{}", sentence);
// Chaining multiple operations
let mut s = String::from("Hello");
s = s + " " + "World" + "!";  // Works but inefficient
println!("{}", s);
}

Removing and Truncating

fn main() {
// pop - remove last character
let mut s = String::from("Hello!");
let last = s.pop();
println!("Popped: {:?}, remaining: {}", last, s);  // Popped: Some('!'), remaining: Hello
// remove - remove character at index
let mut s = String::from("Hello");
let removed = s.remove(1);
println!("Removed: {}, remaining: {}", removed, s);  // Removed: e, remaining: Hllo
// truncate - shorten to specified length
let mut s = String::from("Hello World");
s.truncate(5);
println!("Truncated: {}", s);  // Hello
// clear - remove all characters
let mut s = String::from("Hello");
s.clear();
println!("Cleared: '{}', length: {}", s, s.len());  // Cleared: '', length: 0
// drain - remove range and return iterator
let mut s = String::from("Hello World");
let drained: String = s.drain(6..).collect();
println!("Drained: {}, remaining: {}", drained, s);  // Drained: World, remaining: Hello
// retain - keep only characters satisfying predicate
let mut s = String::from("Hello123World456");
s.retain(|c| !c.is_numeric());
println!("After retain: {}", s);  // HelloWorld
}

Replacing

fn main() {
// replace - replace all matches
let s = String::from("Hello World");
let replaced = s.replace("World", "Rust");
println!("{}", replaced);  // Hello Rust
// replacen - replace N matches
let s = String::from("Hello World World World");
let replaced = s.replacen("World", "Rust", 2);
println!("{}", replaced);  // Hello Rust Rust World
// replace_range - replace range with new string
let mut s = String::from("Hello Beautiful World");
s.replace_range(6..15, "Amazing");
println!("{}", s);  // Hello Amazing World
}

4. String Access and Inspection

Basic Information

fn main() {
let s = String::from("Hello, World!");
// Length (in bytes, not characters)
println!("Length: {} bytes", s.len());  // 13
// Capacity
println!("Capacity: {} bytes", s.capacity());  // Might be > len
// Is empty?
println!("Is empty: {}", s.is_empty());  // false
// Check contains
println!("Contains 'World': {}", s.contains("World"));  // true
// Starts with / ends with
println!("Starts with 'Hello': {}", s.starts_with("Hello"));  // true
println!("Ends with '!': {}", s.ends_with('!'));  // true
// Find
println!("Find 'World': {:?}", s.find("World"));  // Some(7)
println!("Find 'xyz': {:?}", s.find("xyz"));  // None
// Count occurrences
println!("Count 'l': {}", s.matches('l').count());  // 3
}

Character Access (Not Direct Indexing)

fn main() {
let s = String::from("Hello, 世界");
// Can't index directly
// let c = s[0]; // Error: cannot index into String
// Get character by byte position (careful!)
match s.get(0..1) {
Some(slice) => println!("First byte: {:?}", slice),  // "H"
None => println!("Invalid index"),
}
// Get character by char index (need to iterate)
let chars: Vec<char> = s.chars().collect();
println!("First char: {}", chars[0]);  // 'H'
println!("Fifth char: {}", chars[4]);  // 'o'
println!("Sixth char: {}", chars[5]);  // ','
println!("Seventh char: {}", chars[6]);  // ' '
println!("Eighth char: {}", chars[7]);  // '世'
// Iterate over characters
for (i, c) in s.chars().enumerate() {
println!("Char {}: {}", i, c);
}
// Iterate over bytes
for (i, b) in s.bytes().enumerate() {
println!("Byte {}: {}", i, b);
}
// Get nth character safely
fn nth_char(s: &str, n: usize) -> Option<char> {
s.chars().nth(n)
}
println!("5th char: {:?}", nth_char(&s, 5));  // Some(',')
println!("10th char: {:?}", nth_char(&s, 10));  // Some('界')
}

Slicing

fn main() {
let s = String::from("Hello, World!");
// Slice by byte range (careful with UTF-8)
let hello = &s[0..5];
println!("hello: {}", hello);  // Hello
let world = &s[7..12];
println!("world: {}", world);  // World
// Various slice syntaxes
let slice1 = &s[..5];    // from start to index 5
let slice2 = &s[7..];    // from index 7 to end
let slice3 = &s[..];     // entire string
println!("slice1: {}, slice2: {}, slice3: {}", slice1, slice2, slice3);
// Safe slicing with get
match s.get(7..12) {
Some(substring) => println!("Substring: {}", substring),
None => println!("Invalid range"),
}
// UTF-8 aware slicing
let utf = "Hello 世界";
// This would panic because we cut a character in the middle
// let bad_slice = &utf[7..10]; // Panics!
// Safe way
match utf.get(7..10) {
Some(slice) => println!("Slice: {}", slice),
None => println!("Invalid UTF-8 boundary"),
}
// Split at
let (first, last) = s.split_at(5);
println!("First: {}, Last: {}", first, last);
}

5. String Transformation

Case Conversion

fn main() {
let s = String::from("Hello, World!");
// Uppercase
println!("Uppercase: {}", s.to_uppercase());  // HELLO, WORLD!
// Lowercase
println!("Lowercase: {}", s.to_lowercase());  // hello, world!
// Title case
let title = "hello world from rust";
println!("Title case: {}", title.to_title_case());
// Case folding (for case-insensitive comparison)
let s1 = "Straße";
let s2 = "STRASSE";
println!("Case fold equal: {}", s1.to_lowercase() == s2.to_lowercase());  // false
println!("Case fold equal: {}", s1.to_uppercase() == s2.to_uppercase());  // true
}
trait TitleCase {
fn to_title_case(&self) -> String;
}
impl TitleCase for str {
fn to_title_case(&self) -> String {
self.split_whitespace()
.map(|word| {
let mut chars = word.chars();
match chars.next() {
None => String::new(),
Some(first) => first.to_uppercase().collect::<String>() + chars.as_str(),
}
})
.collect::<Vec<_>>()
.join(" ")
}
}

Trimming

fn main() {
let s = String::from("  \tHello, World!\n  ");
// Trim whitespace from both ends
println!("Trim: '{}'", s.trim());  // 'Hello, World!'
// Trim start
println!("Trim start: '{}'", s.trim_start());  // 'Hello, World!\n  '
// Trim end
println!("Trim end: '{}'", s.trim_end());  // '  \tHello, World!'
// Trim specific characters
let s2 = "---Hello---";
println!("Trim matches: '{}'", s2.trim_matches('-'));  // 'Hello'
// Trim with predicate
let s3 = "123Hello456";
let trimmed: String = s3.trim_matches(|c: char| c.is_numeric()).to_string();
println!("Trim digits: '{}'", trimmed);  // 'Hello'
}

Splitting

fn main() {
let s = String::from("apple,banana,orange,grape");
// Split by delimiter
for fruit in s.split(',') {
println!("Fruit: {}", fruit);
}
// Split and collect
let fruits: Vec<&str> = s.split(',').collect();
println!("Fruits: {:?}", fruits);
// Split by whitespace
let words = "Hello world from Rust";
for word in words.split_whitespace() {
println!("Word: {}", word);
}
// Split by lines
let multiline = "Line 1\nLine 2\r\nLine 3";
for line in multiline.lines() {
println!("Line: {}", line);
}
// Split at most n times
let parts: Vec<&str> = s.splitn(2, ',').collect();
println!("Split into 2: {:?}", parts);  // ["apple", "banana,orange,grape"]
// Split including delimiter
let with_delim: Vec<&str> = s.split_inclusive(',').collect();
println!("With delimiter: {:?}", with_delim);  // ["apple,", "banana,", "orange,", "grape"]
// Split by character class
let mixed = "a1b2c3d4";
for part in mixed.split(char::is_numeric) {
println!("Part: {}", part);  // "a", "b", "c", "d", ""
}
}

Joining

fn main() {
let words = vec!["Hello", "world", "from", "Rust"];
// Join with separator
let sentence = words.join(" ");
println!("{}", sentence);  // Hello world from Rust
// Join with different separators
println!("{}", words.join(", "));  // Hello, world, from, Rust
println!("{}", words.join("-"));   // Hello-world-from-Rust
// Join with empty separator
let chars = vec!["a", "b", "c"];
println!("{}", chars.join(""));  // abc
// Join from iterator
let numbers = [1, 2, 3, 4];
let joined: String = numbers.iter()
.map(|n| n.to_string())
.collect::<Vec<_>>()
.join(", ");
println!("Numbers: {}", joined);  // Numbers: 1, 2, 3, 4
// Concat (no separator)
let parts = vec!["Hello", " ", "World", "!"];
let result: String = parts.concat();
println!("{}", result);  // Hello World!
}

6. String Searching

Finding Patterns

fn main() {
let s = "The quick brown fox jumps over the lazy dog";
// Find first occurrence
println!("Find 'fox': {:?}", s.find("fox"));  // Some(16)
println!("Find 'cat': {:?}", s.find("cat"));  // None
// Find last occurrence
println!("Rfind 'the': {:?}", s.rfind("the"));  // Some(31)
// Find with position
if let Some(pos) = s.find("fox") {
println!("Found 'fox' at position {}", pos);
}
// Find with range
let sub = &s[20..];
println!("Find in substring: {:?}", sub.find("dog"));  // Some(16)
// Check if contains
println!("Contains 'jumps': {}", s.contains("jumps"));  // true
println!("Contains 'cat': {}", s.contains("cat"));      // false
// Starts/Ends with
println!("Starts with 'The': {}", s.starts_with("The"));  // true
println!("Ends with 'dog': {}", s.ends_with("dog"));      // true
}

Pattern Matching

fn main() {
let s = "Hello, world! Hello, Rust!";
// Count matches
println!("Count 'Hello': {}", s.matches("Hello").count());  // 2
// Iterate over matches
for mat in s.matches("Hello") {
println!("Found: {}", mat);
}
// Iterate with indices
for (i, mat) in s.match_indices("Hello") {
println!("Found '{}' at index {}", mat, i);
}
// Pattern with char predicate
let digits = "a1b2c3d4";
for digit in digits.matches(char::is_numeric) {
println!("Digit: {}", digit);
}
// Find all positions
let positions: Vec<_> = s.match_indices('o').map(|(i, _)| i).collect();
println!("Positions of 'o': {:?}", positions);  // [4, 8, 16, 20]
// Check if any match exists
println!("Any numbers: {}", digits.chars().any(char::is_numeric));  // true
println!("All letters: {}", digits.chars().all(char::is_alphabetic));  // false
}

7. String Comparison

Equality and Ordering

fn main() {
// Basic equality
let s1 = String::from("hello");
let s2 = String::from("hello");
let s3 = String::from("world");
println!("s1 == s2: {}", s1 == s2);  // true
println!("s1 == s3: {}", s1 == s3);  // false
println!("s1 != s3: {}", s1 != s3);  // true
// Compare with &str
println!("s1 == \"hello\": {}", s1 == "hello");  // true
println!("\"world\" == s3: {}", "world" == s3);  // true
// Ordering
println!("s1 < s3: {}", s1 < s3);   // true (lexicographic)
println!("s1 > s3: {}", s1 > s3);   // false
// Case-sensitive vs insensitive
let s4 = String::from("Hello");
println!("s1 == s4: {}", s1 == s4);  // false
println!("s1.eq_ignore_ascii_case(&s4): {}", 
s1.eq_ignore_ascii_case(&s4));  // true
// Using cmp for detailed ordering
use std::cmp::Ordering;
match s1.cmp(&s3) {
Ordering::Less => println!("s1 < s3"),
Ordering::Equal => println!("s1 == s3"),
Ordering::Greater => println!("s1 > s3"),
}
}

Advanced Comparisons

fn main() {
let s1 = String::from("apple");
let s2 = String::from("APPLE");
let s3 = String::from("banana");
// Case-insensitive comparison
fn case_insensitive_cmp(a: &str, b: &str) -> Ordering {
a.to_lowercase().cmp(&b.to_lowercase())
}
match case_insensitive_cmp(&s1, &s2) {
Ordering::Equal => println!("Equal ignoring case"),
_ => println!("Not equal"),
}
// Custom ordering (by length then alphabetically)
fn custom_cmp(a: &str, b: &str) -> Ordering {
match a.len().cmp(&b.len()) {
Ordering::Equal => a.cmp(b),
other => other,
}
}
let strings = vec!["a", "bb", "ccc", "dd", "e"];
let mut sorted = strings.clone();
sorted.sort_by(|a, b| custom_cmp(a, b));
println!("Custom sort: {:?}", sorted);
// Partial comparison (for non-total order)
if s1.partial_cmp(&s3).unwrap() == Ordering::Less {
println!("{} comes before {}", s1, s3);
}
}

8. String Parsing

Parsing to Numbers

fn main() {
// Parse to integer
let num = "42".parse::<i32>().unwrap();
println!("Parsed i32: {}", num);
// Parse with error handling
let num = "42".parse::<i32>();
match num {
Ok(n) => println!("Number: {}", n),
Err(e) => println!("Error: {}", e),
}
// Parse different numeric types
let i: i32 = "100".parse().unwrap();
let u: u32 = "200".parse().unwrap();
let f: f64 = "3.14".parse().unwrap();
let hex = i32::from_str_radix("FF", 16).unwrap();
println!("i32: {}, u32: {}, f64: {}, hex: {}", i, u, f, hex);
// Parse with error handling for multiple values
let inputs = vec!["42", "24", "abc", "12"];
let numbers: Vec<i32> = inputs.iter()
.filter_map(|s| s.parse().ok())
.collect();
println!("Valid numbers: {:?}", numbers);
// Parse with custom error handling
fn parse_number(s: &str) -> Result<i32, String> {
s.parse()
.map_err(|_| format!("Cannot parse '{}' as number", s))
}
match parse_number("42") {
Ok(n) => println!("Parsed: {}", n),
Err(e) => println!("Error: {}", e),
}
}

Parsing to Other Types

use std::str::FromStr;
#[derive(Debug, PartialEq)]
struct Point {
x: i32,
y: i32,
}
impl FromStr for Point {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let coords: Vec<&str> = s.split(',').collect();
if coords.len() != 2 {
return Err("Expected format: x,y".to_string());
}
let x = coords[0].parse()
.map_err(|_| "Invalid x coordinate")?;
let y = coords[1].parse()
.map_err(|_| "Invalid y coordinate")?;
Ok(Point { x, y })
}
}
fn main() {
// Parse custom type
let p = "10,20".parse::<Point>().unwrap();
println!("Parsed point: {:?}", p);
// Parse boolean
let b1 = "true".parse::<bool>().unwrap();
let b2 = "false".parse::<bool>().unwrap();
println!("Bools: {}, {}", b1, b2);
// Parse char
let c = "a".parse::<char>().unwrap();
println!("Char: {}", c);
// Parse with FromStr directly
let p2 = Point::from_str("30,40").unwrap();
println!("FromStr: {:?}", p2);
// Parse multiple values
let points: Vec<Point> = vec!["10,20", "30,40", "50,60"]
.iter()
.filter_map(|s| s.parse().ok())
.collect();
println!("Points: {:?}", points);
}

9. String Iteration

Character Iteration

fn main() {
let s = String::from("Hello 世界");
// Iterate over characters
println!("Characters:");
for c in s.chars() {
println!("  '{}'", c);
}
// Collect characters into Vec
let chars: Vec<char> = s.chars().collect();
println!("Chars vector: {:?}", chars);
// Iterate with indices
for (i, c) in s.char_indices() {
println!("Character at byte {}: '{}'", i, c);
}
// Count characters
println!("Character count: {}", s.chars().count());
// Get nth character
if let Some(c) = s.chars().nth(7) {
println!("7th character: '{}'", c);  // '世'
}
// Reverse iteration
println!("Reverse chars:");
for c in s.chars().rev() {
println!("  '{}'", c);
}
}

Byte and Grapheme Iteration

fn main() {
let s = String::from("Hello 世界");
// Iterate over bytes
println!("Bytes:");
for (i, b) in s.bytes().enumerate() {
println!("  Byte {}: {}", i, b);
}
// Collect bytes into Vec
let bytes: Vec<u8> = s.bytes().collect();
println!("Bytes vector: {:?}", bytes);
// Grapheme clusters (using unicode-segmentation crate)
// Add to Cargo.toml: unicode-segmentation = "1.10"
use unicode_segmentation::UnicodeSegmentation;
let s2 = "नमस्ते";  // Hindi greeting
println!("Graphemes:");
for g in s2.graphemes(true) {
println!("  '{}'", g);
}
// Split by grapheme boundaries
let graphemes: Vec<&str> = s2.graphemes(true).collect();
println!("Graphemes vector: {:?}", graphemes);
}

10. String Performance

Pre-allocation

fn main() {
// Bad: Repeated reallocation
let start = std::time::Instant::now();
let mut s = String::new();
for i in 0..10000 {
s.push_str("hello");
}
println!("No pre-allocation: {:?}", start.elapsed());
// Good: Pre-allocate capacity
let start = std::time::Instant::now();
let mut s = String::with_capacity(50000);
for i in 0..10000 {
s.push_str("hello");
}
println!("With pre-allocation: {:?}", start.elapsed());
// Check capacity
let mut s = String::with_capacity(100);
println!("Capacity: {}", s.capacity());
s.push_str("Hello");
println!("After push: {}", s.capacity());
// Shrink to fit
let mut s = String::with_capacity(100);
s.push_str("Hello");
s.shrink_to_fit();
println!("After shrink: {}", s.capacity());
// Reserve additional capacity
let mut s = String::from("Hello");
s.reserve(100);
println!("After reserve: {}", s.capacity());
}

Building Strings Efficiently

fn main() {
// Bad: Using + operator repeatedly
let start = std::time::Instant::now();
let mut s = String::new();
for i in 0..1000 {
s = s + &i.to_string() + ",";
}
println!("Using +: {:?}", start.elapsed());
// Good: Using push_str
let start = std::time::Instant::now();
let mut s = String::with_capacity(5000);
for i in 0..1000 {
s.push_str(&i.to_string());
s.push(',');
}
println!("Using push_str: {:?}", start.elapsed());
// Better: Using format! macro (good for small number of concatenations)
let a = "Hello";
let b = "World";
let c = format!("{}, {}!", a, b);
// Best: Using join for collections
let strings: Vec<String> = (0..1000).map(|i| i.to_string()).collect();
let start = std::time::Instant::now();
let result = strings.join(",");
println!("Using join: {:?}", start.elapsed());
// Using extend for iterators
let mut s = String::new();
s.extend((0..1000).map(|i| i.to_string()));
}

Memory Usage

fn main() {
// Memory size of different string types
println!("Size of &str: {} bytes", std::mem::size_of::<&str>());
println!("Size of String: {} bytes", std::mem::size_of::<String>());
println!("Size of Box<str>: {} bytes", std::mem::size_of::<Box<str>>());
// String vs &str memory layout
let s1 = "hello";  // &str - pointer to static data
let s2 = String::from("hello");  // String - pointer to heap
println!("&str pointer: {:p}", &s1);
println!("String pointer: {:p}", &s2);
// String sharing (not possible, each has own allocation)
let s3 = String::from("hello");
let s4 = s3.clone();  // Creates new allocation
println!("s3 ptr: {:p}, s4 ptr: {:p}", &s3, &s4);
// String vs &str in collections
// Use &str for temporary access, String for ownership
let string_vec: Vec<String> = vec!["a".to_string(), "b".to_string()];
let str_vec: Vec<&str> = string_vec.iter().map(|s| s.as_str()).collect();
}

11. Unicode and Internationalization

UTF-8 Handling

fn main() {
// UTF-8 encoding
let s = "Hello 世界";
// Length in bytes vs characters
println!("Byte length: {}", s.len());      // 13
println!("Char count: {}", s.chars().count());  // 9
// Check if valid UTF-8
println!("Is valid UTF-8: {}", std::str::from_utf8(s.as_bytes()).is_ok());
// Handling invalid UTF-8
let bytes = vec![0x48, 0x65, 0x6C, 0x6C, 0x6F, 0xFF];  // Invalid byte
match std::str::from_utf8(&bytes) {
Ok(s) => println!("Valid: {}", s),
Err(e) => println!("Invalid UTF-8: {}", e),
}
// Lossy conversion
let lossy = String::from_utf8_lossy(&bytes);
println!("Lossy: {}", lossy);  // Hello�
// Unicode properties
for c in "A1世".chars() {
println!("Char: '{}'", c);
println!("  Alphabetic: {}", c.is_alphabetic());
println!("  Numeric: {}", c.is_numeric());
println!("  Alphanumeric: {}", c.is_alphanumeric());
println!("  Whitespace: {}", c.is_whitespace());
println!("  Control: {}", c.is_control());
}
}

Normalization

use unicode_normalization::UnicodeNormalization;
fn main() {
// Unicode normalization (add to Cargo.toml: unicode-normalization = "0.1")
// Different representations of "é"
let nfd = "e\u{0301}";  // e + combining acute accent (NFD)
let nfc = "é";           // precomposed e with acute (NFC)
println!("NFD: {}", nfd);
println!("NFC: {}", nfc);
// They look the same but are different strings
println!("Length NFD: {}, NFC: {}", nfd.len(), nfc.len());
println!("Bytes NFD: {:?}, NFC: {:?}", nfd.as_bytes(), nfc.as_bytes());
println!("Equal: {}", nfd == nfc);  // false
// Normalize for comparison
let nfd_nfc: String = nfd.nfc().collect();
println!("NFD normalized to NFC: {}", nfd_nfc);
println!("Normalized equal: {}", nfd_nfc == nfc);  // true
// Different normalization forms
let s = "Å";
println!("Original: {}", s);
println!("NFC: {:?}", s.nfc().collect::<String>());
println!("NFD: {:?}", s.nfd().collect::<String>());
println!("NFKC: {:?}", s.nfkc().collect::<String>());
println!("NFKD: {:?}", s.nfkd().collect::<String>());
}

12. Common Patterns and Best Practices

String Building Patterns

fn main() {
// Pattern 1: Builder pattern for complex strings
struct QueryBuilder {
parts: Vec<String>,
}
impl QueryBuilder {
fn new() -> Self {
QueryBuilder { parts: Vec::new() }
}
fn add_condition(mut self, field: &str, value: &str) -> Self {
self.parts.push(format!("{}='{}'", field, value));
self
}
fn build(self) -> String {
if self.parts.is_empty() {
String::new()
} else {
format!("WHERE {}", self.parts.join(" AND "))
}
}
}
let query = QueryBuilder::new()
.add_condition("name", "Alice")
.add_condition("age", "30")
.build();
println!("Query: {}", query);
// Pattern 2: String interning for repeated strings
use std::collections::HashMap;
struct StringCache {
cache: HashMap<String, &'static str>,
}
impl StringCache {
fn new() -> Self {
StringCache { cache: HashMap::new() }
}
fn intern(&mut self, s: String) -> &str {
self.cache.entry(s).or_insert(Box::leak(Box::new(s.clone())))
}
}
// Pattern 3: Lazy static strings
use lazy_static::lazy_static;
lazy_static! {
static ref GREETING: String = {
let mut s = String::new();
s.push_str("Hello, ");
s.push_str("World");
s
};
}
println!("Lazy static: {}", *GREETING);
}

Error Handling with Strings

fn main() -> Result<(), Box<dyn std::error::Error>> {
// Using ? operator with string parsing
let num = "42".parse::<i32>()?;
println!("Parsed: {}", num);
// Custom error messages
fn validate_name(name: &str) -> Result<&str, String> {
if name.is_empty() {
Err("Name cannot be empty".to_string())
} else if name.len() > 50 {
Err("Name too long".to_string())
} else {
Ok(name)
}
}
match validate_name("Alice") {
Ok(name) => println!("Valid name: {}", name),
Err(e) => println!("Error: {}", e),
}
// Using anyhow for complex error handling
// (add to Cargo.toml: anyhow = "1.0")
use anyhow::{anyhow, Context};
fn parse_config(s: &str) -> anyhow::Result<Vec<i32>> {
s.split(',')
.map(|part| {
part.parse::<i32>()
.with_context(|| format!("Failed to parse '{}' as number", part))
})
.collect::<Result<Vec<_>, _>>()
.map_err(|e| anyhow!("Config parsing failed: {}", e))
}
match parse_config("1,2,three,4") {
Ok(nums) => println!("Config: {:?}", nums),
Err(e) => println!("Error: {}", e),
}
Ok(())
}

Performance Tips

fn main() {
// 1. Pre-allocate when you know approximate size
let mut s = String::with_capacity(1000);
// 2. Use push_str instead of + for repeated concatenation
for _ in 0..100 {
s.push_str("hello");
}
// 3. Use collect for building from iterators
let items = vec!["a", "b", "c"];
let result: String = items.iter()
.map(|&s| s.to_uppercase())
.collect();
// 4. Avoid unnecessary allocations
fn process_str(s: &str) -> bool {
s.contains("pattern")
}
let owned = String::from("test pattern");
// Good: borrow instead of creating new String
process_str(&owned);
// 5. Use `as_str()` when you need &str from String
let s = String::from("hello");
let slice = s.as_str();  // No allocation
// 6. For many small strings, consider using `SmallString` optimization
// from the `smallstr` crate
// 7. Profile your code to find real bottlenecks
use std::time::Instant;
let start = Instant::now();
// ... your string operations ...
println!("Time: {:?}", start.elapsed());
}

Conclusion

Rust's string handling is designed to be both safe and efficient:

Key Takeaways

  1. Two String Types: String (owned, heap-allocated) and &str (borrowed slice)
  2. UTF-8 Guaranteed: All strings are valid UTF-8
  3. No Indexing: Can't index by position due to UTF-8
  4. Rich API: Extensive methods for manipulation
  5. Zero-Cost: Abstractions compile to efficient code

Best Practices

SituationRecommended Approach
String literalsUse &str
Need to modifyUse String
Function parameterUse &str
Return owned stringReturn String
Concatenate many stringsUse push_str or join
Format complex stringsUse format! macro
Parse stringsUse parse() method
Case-insensitive compareNormalize first

Common Operations Cheat Sheet

// Creation
let s = String::from("hello");
let s = "hello".to_string();
let s = format!("{}{}", a, b);
// Inspection
let len = s.len();
let is_empty = s.is_empty();
let chars = s.chars().count();
// Modification
s.push_str("world");
s.push('!');
s.insert(5, ',');
s.pop();
s.remove(5);
s.clear();
// Transformation
let upper = s.to_uppercase();
let lower = s.to_lowercase();
let trimmed = s.trim();
let replaced = s.replace("old", "new");
// Conversion
let slice = s.as_str();
let bytes = s.into_bytes();
let from_bytes = String::from_utf8(bytes).unwrap();
// Iteration
for c in s.chars() { }
for b in s.bytes() { }
for (i, c) in s.char_indices() { }

Rust's string types provide a powerful foundation for text processing, balancing safety, expressiveness, and performance. The learning curve is worth it for the guarantees and efficiency they provide.

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper