Memory alignment is one of those topics that lurks in the shadows of C programming—often ignored until it causes mysterious bugs, performance degradation, or crashes. Understanding alignment is crucial for systems programming, embedded development, performance optimization, and working with hardware interfaces. This guide will demystify memory alignment, explaining what it is, why it matters, and how to manage it in C.
What Is Memory Alignment?
Memory alignment refers to the placement of data in memory at addresses that are multiples of some value (usually the size of the data type). Most modern CPUs require or perform better when data is accessed at aligned addresses.
Simple Explanation:
- A 4-byte integer should ideally be stored at an address divisible by 4
- An 8-byte double should be at an address divisible by 8
- A 2-byte short should be at an address divisible by 2
#include <stdio.h>
#include <stdint.h>
int main() {
int x = 42;
char c = 'A';
double d = 3.14159;
printf("Address of int (should be multiple of 4): %p\n", (void*)&x);
printf("Address of char (can be any address): %p\n", (void*)&c);
printf("Address of double (should be multiple of 8): %p\n", (void*)&d);
// Check alignment
printf("int alignment: %d\n", (int)((uintptr_t)&x % 4 == 0));
printf("double alignment: %d\n", (int)((uintptr_t)&d % 8 == 0));
return 0;
}
Why Alignment Matters
1. Performance
Unaligned access can be 2-3 times slower on x86 and may cause exceptions on other architectures (like ARM, SPARC).
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define SIZE 1000000
#define ITERATIONS 100
int main() {
char *buffer = malloc(SIZE * sizeof(int) + 7);
int *aligned_ptr;
int *unaligned_ptr;
// Create aligned and unaligned pointers
aligned_ptr = (int*)(((uintptr_t)buffer + 7) & ~7); // Aligned to 8 bytes
unaligned_ptr = (int*)((uintptr_t)aligned_ptr + 1); // Misaligned
// Initialize
for (int i = 0; i < SIZE; i++) {
aligned_ptr[i] = i;
}
clock_t start, end;
// Benchmark aligned access
start = clock();
for (int iter = 0; iter < ITERATIONS; iter++) {
int sum = 0;
for (int i = 0; i < SIZE; i++) {
sum += aligned_ptr[i];
}
}
end = clock();
double aligned_time = ((double)(end - start)) / CLOCKS_PER_SEC;
// Copy to unaligned location
for (int i = 0; i < SIZE; i++) {
unaligned_ptr[i] = i;
}
// Benchmark unaligned access
start = clock();
for (int iter = 0; iter < ITERATIONS; iter++) {
int sum = 0;
for (int i = 0; i < SIZE; i++) {
sum += unaligned_ptr[i];
}
}
end = clock();
double unaligned_time = ((double)(end - start)) / CLOCKS_PER_SEC;
printf("Aligned access time: %.3f seconds\n", aligned_time);
printf("Unaligned access time: %.3f seconds\n", unaligned_time);
printf("Unaligned is %.2fx slower\n", unaligned_time / aligned_time);
free(buffer);
return 0;
}
2. Hardware Requirements
Some architectures simply cannot perform unaligned accesses:
// This might crash on some architectures char buffer[6]; int *ptr = (int*)&buffer[1]; // Misaligned pointer *ptr = 0x12345678; // May cause bus error on ARM, SPARC, etc.
3. Atomic Operations
Many atomic operations require aligned addresses:
#include <stdatomic.h> _Atomic int aligned_int; // Compiler ensures alignment _Atomic int unaligned_int __attribute__((aligned(1))); // May not work! // Operations on misaligned atomics may fail or be slow
Alignment Requirements by Type
#include <stdio.h>
#include <stdalign.h>
int main() {
printf("Type alignment requirements:\n");
printf("char: %zu bytes\n", alignof(char));
printf("short: %zu bytes\n", alignof(short));
printf("int: %zu bytes\n", alignof(int));
printf("long: %zu bytes\n", alignof(long));
printf("float: %zu bytes\n", alignof(float));
printf("double: %zu bytes\n", alignof(double));
printf("pointer: %zu bytes\n", alignof(void*));
return 0;
}
Typical Alignment Values (x86-64):
| Type | Size (bytes) | Alignment (bytes) |
|---|---|---|
| char | 1 | 1 |
| short | 2 | 2 |
| int | 4 | 4 |
| long | 8 | 8 |
| float | 4 | 4 |
| double | 8 | 8 |
| pointer | 8 | 8 |
Structure Alignment and Padding
The compiler automatically adds padding between structure members to ensure proper alignment:
#include <stdio.h>
#include <stddef.h>
// Without padding consideration
struct BadlyPacked {
char c; // 1 byte
int i; // 4 bytes (needs 4-byte alignment)
short s; // 2 bytes
};
// Better ordering (but still may have padding)
struct BetterOrdered {
int i; // 4 bytes
short s; // 2 bytes
char c; // 1 byte
};
// Packed (no padding)
#pragma pack(push, 1)
struct Packed {
char c;
int i;
short s;
};
#pragma pack(pop)
int main() {
printf("Size of BadlyPacked: %zu bytes\n", sizeof(struct BadlyPacked));
printf(" offsetof(c): %zu\n", offsetof(struct BadlyPacked, c));
printf(" offsetof(i): %zu\n", offsetof(struct BadlyPacked, i));
printf(" offsetof(s): %zu\n\n", offsetof(struct BadlyPacked, s));
printf("Size of BetterOrdered: %zu bytes\n", sizeof(struct BetterOrdered));
printf(" offsetof(i): %zu\n", offsetof(struct BetterOrdered, i));
printf(" offsetof(s): %zu\n", offsetof(struct BetterOrdered, s));
printf(" offsetof(c): %zu\n\n", offsetof(struct BetterOrdered, c));
printf("Size of Packed: %zu bytes\n", sizeof(struct Packed));
printf(" offsetof(c): %zu\n", offsetof(struct Packed, c));
printf(" offsetof(i): %zu\n", offsetof(struct Packed, i));
printf(" offsetof(s): %zu\n\n", offsetof(struct Packed, s));
return 0;
}
Output (typical):
Size of BadlyPacked: 12 bytes offsetof(c): 0 offsetof(i): 4 offsetof(s): 8 Size of BetterOrdered: 8 bytes offsetof(i): 0 offsetof(s): 4 offsetof(c): 6 Size of Packed: 7 bytes offsetof(c): 0 offsetof(i): 1 offsetof(s): 5
Why Padding Exists
The compiler adds padding to satisfy each member's alignment requirements:
struct Example {
char a; // offset 0
// 3 bytes padding (to make int aligned to 4)
int b; // offset 4
short c; // offset 8
// 2 bytes padding (to make total size multiple of largest alignment)
};
// Total size = 12 bytes
/*
Memory layout:
+---+---+---+---+---+---+---+---+---+---+---+---+
| a | P | P | P | b | b | b | b | c | c | P | P |
+---+---+---+---+---+---+---+---+---+---+---+---+
0 1 2 3 4 5 6 7 8 9 10 11
*/
Controlling Alignment
1. alignas Specifier (C11)
#include <stdalign.h>
// Specify alignment for variables
alignas(16) int aligned_array[100]; // Aligned to 16-byte boundary
alignas(4096) char page_buffer[4096]; // Page-aligned for DMA
// Specify alignment for structure members
struct Vector {
alignas(16) float x, y, z, w; // 16-byte aligned for SIMD
};
// Specify alignment for structures
struct alignas(64) CacheLine {
int data[16]; // Entire structure aligned to cache line
};
int main() {
printf("aligned_array address: %p\n", (void*)aligned_array);
printf("Is 16-byte aligned? %d\n",
(int)((uintptr_t)aligned_array % 16 == 0));
return 0;
}
2. GCC/Clang Attributes
// Variable alignment
int aligned_var __attribute__((aligned(16))) = 42;
// Structure alignment
struct __attribute__((aligned(64))) CacheLine {
int data[16];
};
// Packed structures (no padding)
struct __attribute__((packed)) NetworkPacket {
uint8_t header;
uint32_t length;
uint16_t flags;
};
// Alignment for specific members
struct SIMDVector {
float x, y, z, w;
} __attribute__((aligned(16)));
int main() {
struct NetworkPacket pkt;
printf("Packed struct size: %zu\n", sizeof(pkt)); // 1+4+2 = 7
return 0;
}
3. MSVC Equivalents
// MSVC alignment
__declspec(align(16)) int aligned_array[100];
// Packed structures
#pragma pack(push, 1)
struct NetworkPacket {
uint8_t header;
uint32_t length;
uint16_t flags;
};
#pragma pack(pop)
Alignment and Dynamic Memory
Standard malloc returns memory suitably aligned for any type:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main() {
void *ptr = malloc(100);
printf("malloc address: %p\n", ptr);
printf("Is 8-byte aligned? %d\n", (int)((uintptr_t)ptr % 8 == 0));
printf("Is 16-byte aligned? %d\n", (int)((uintptr_t)ptr % 16 == 0));
free(ptr);
// For special alignment requirements
void *aligned_ptr;
posix_memalign(&aligned_ptr, 64, 1024); // POSIX
printf("posix_memalign (64-byte): %p\n", aligned_ptr);
free(aligned_ptr);
// C11 aligned_alloc
void *c11_ptr = aligned_alloc(64, 1024);
printf("aligned_alloc (64-byte): %p\n", c11_ptr);
free(c11_ptr);
return 0;
}
Alignment and Arrays
Array alignment ensures each element is properly aligned:
#include <stdio.h>
int main() {
int arr[10];
double darr[10];
printf("int array: base=%p, element size=%zu, alignment=%zu\n",
(void*)arr, sizeof(arr[0]), alignof(int));
printf(" arr[0]: %p\n", (void*)&arr[0]);
printf(" arr[1]: %p (offset %zu)\n",
(void*)&arr[1], (char*)&arr[1] - (char*)&arr[0]);
printf("\ndouble array: base=%p, element size=%zu, alignment=%zu\n",
(void*)darr, sizeof(darr[0]), alignof(double));
return 0;
}
Alignment and Type Punning
Type punning through pointers can break alignment rules:
#include <stdio.h>
#include <string.h>
int main() {
char buffer[8] = {0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0};
// DANGEROUS: May cause alignment issues
int *int_ptr = (int*)buffer; // buffer may not be int-aligned
int value = *int_ptr; // Potential crash on some architectures
// SAFE: Use memcpy to avoid alignment issues
int safe_value;
memcpy(&safe_value, buffer, sizeof(int));
printf("Direct access: %X\n", value);
printf("Memcpy access: %X\n", safe_value);
return 0;
}
Cache Line Alignment
Modern CPUs use cache lines (typically 64 bytes). Aligning data to cache lines can improve performance:
#include <stdio.h>
#include <stdlib.h>
#define CACHE_LINE_SIZE 64
// Structure that fits in one cache line
struct alignas(CACHE_LINE_SIZE) HotData {
int counter;
int flags;
// 56 bytes padding to fill cache line
char padding[CACHE_LINE_SIZE - 2 * sizeof(int)];
};
// Multiple structures that might share cache lines (false sharing)
struct Counter {
int value;
};
struct Counter counters[4]; // May share cache lines
// Better: Separate each counter onto its own cache line
struct alignas(CACHE_LINE_SIZE) PaddedCounter {
int value;
char padding[CACHE_LINE_SIZE - sizeof(int)];
};
struct PaddedCounter padded_counters[4]; // Each on different cache line
int main() {
printf("HotData size: %zu bytes (should be %d)\n",
sizeof(struct HotData), CACHE_LINE_SIZE);
printf("Regular counters - potential false sharing:\n");
for (int i = 0; i < 4; i++) {
printf(" counter[%d] at %p\n", i, (void*)&counters[i]);
}
printf("\nPadded counters - each on separate cache line:\n");
for (int i = 0; i < 4; i++) {
printf(" padded[%d] at %p\n", i, (void*)&padded_counters[i]);
}
return 0;
}
SIMD Alignment
SIMD instructions often require 16-byte or 32-byte alignment:
#include <stdio.h>
#include <stdlib.h>
#ifdef __SSE__
#include <xmmintrin.h>
// 16-byte aligned for SSE
struct alignas(16) SSEVector {
float data[4]; // 4 floats = 16 bytes
};
void add_vectors_sse(const SSEVector *a, const SSEVector *b, SSEVector *result) {
__m128 va = _mm_load_ps(a->data); // Requires 16-byte alignment
__m128 vb = _mm_load_ps(b->data);
__m128 vr = _mm_add_ps(va, vb);
_mm_store_ps(result->data, vr); // Requires 16-byte alignment
}
#endif
#ifdef __AVX__
#include <immintrin.h>
// 32-byte aligned for AVX
struct alignas(32) AVXVector {
double data[4]; // 4 doubles = 32 bytes
};
#endif
int main() {
#ifdef __SSE__
SSEVector a = {{1.0f, 2.0f, 3.0f, 4.0f}};
SSEVector b = {{5.0f, 6.0f, 7.0f, 8.0f}};
SSEVector result;
printf("SSEVector address: %p\n", (void*)&a);
printf("Is 16-byte aligned? %d\n", (int)((uintptr_t)&a % 16 == 0));
add_vectors_sse(&a, &b, &result);
printf("Result[0]: %f\n", result.data[0]);
#endif
return 0;
}
Alignment and Atomic Operations
#include <stdio.h>
#include <stdatomic.h>
#include <stdalign.h>
// Atomic operations often require natural alignment
struct Data {
_Atomic int counter; // Compiler ensures proper alignment
int regular; // Regular int
};
// For lock-free operations, alignment is critical
_Atomic long long ll_counter; // Must be 8-byte aligned for lock-free ops
int main() {
printf("_Atomic int alignment: %zu\n", alignof(_Atomic int));
printf("_Atomic long long alignment: %zu\n", alignof(_Atomic long long));
// Check if operations are lock-free
printf("int is lock-free: %d\n", atomic_is_lock_free(&(atomic_int){0}));
printf("long long is lock-free: %d\n",
atomic_is_lock_free(&(atomic_llong){0}));
return 0;
}
Alignment in Network Protocols
Network protocols often use packed structures to match wire format:
#include <stdio.h>
#include <stdint.h>
// Network packet headers are typically packed (no padding)
#pragma pack(push, 1)
struct EthernetHeader {
uint8_t dest_mac[6];
uint8_t src_mac[6];
uint16_t ethertype;
};
struct IPv4Header {
uint8_t version_ihl; // Version (4 bits) + IHL (4 bits)
uint8_t dscp_ecn; // DSCP + ECN
uint16_t total_length;
uint16_t identification;
uint16_t flags_fragment; // Flags + Fragment Offset
uint8_t ttl;
uint8_t protocol;
uint16_t checksum;
uint32_t src_ip;
uint32_t dst_ip;
};
struct TCPHeader {
uint16_t src_port;
uint16_t dst_port;
uint32_t seq_num;
uint32_t ack_num;
uint8_t data_offset;
uint8_t flags;
uint16_t window;
uint16_t checksum;
uint16_t urgent_ptr;
};
#pragma pack(pop)
int main() {
printf("Ethernet header size: %zu bytes\n", sizeof(struct EthernetHeader));
printf("IPv4 header size: %zu bytes\n", sizeof(struct IPv4Header));
printf("TCP header size: %zu bytes\n", sizeof(struct TCPHeader));
// Note: Without packing, these would be larger due to alignment padding
return 0;
}
Alignment and Memory-Mapped I/O
Hardware registers often have specific alignment requirements:
#include <stdio.h>
#include <stdint.h>
// Hardware register layout (must match hardware specification)
typedef struct {
volatile uint32_t control; // Control register
volatile uint32_t status; // Status register
volatile uint32_t data; // Data register
volatile uint32_t interrupt; // Interrupt register
} HardwareRegisters;
// Ensure structure matches hardware layout
_Static_assert(sizeof(HardwareRegisters) == 16,
"Hardware register size mismatch");
_Static_assert(offsetof(HardwareRegisters, control) == 0,
"Control register offset mismatch");
_Static_assert(offsetof(HardwareRegisters, status) == 4,
"Status register offset mismatch");
// Memory-mapped device access
#define DEVICE_BASE 0xFFFF0000
#define DEVICE ((volatile HardwareRegisters*)DEVICE_BASE)
void write_device(uint32_t value) {
DEVICE->data = value;
DEVICE->control = 0x01; // Start transfer
while (!(DEVICE->status & 0x01)) {
// Wait for completion
}
}
int main() {
printf("Hardware register structure:\n");
printf(" control offset: %zu\n", offsetof(HardwareRegisters, control));
printf(" status offset: %zu\n", offsetof(HardwareRegisters, status));
printf(" data offset: %zu\n", offsetof(HardwareRegisters, data));
printf(" interrupt offset: %zu\n", offsetof(HardwareRegisters, interrupt));
return 0;
}
Alignment Macros and Utilities
#include <stdio.h>
#include <stdint.h>
// Check if pointer is aligned to N bytes
#define IS_ALIGNED(ptr, N) (((uintptr_t)(ptr) & ((N) - 1)) == 0)
// Align up to next N-byte boundary
#define ALIGN_UP(addr, N) (((uintptr_t)(addr) + (N) - 1) & ~((N) - 1))
// Align down to previous N-byte boundary
#define ALIGN_DOWN(addr, N) ((uintptr_t)(addr) & ~((N) - 1))
// Get padding needed to align to N bytes
#define PADDING(addr, N) ((N) - ((uintptr_t)(addr) & ((N) - 1))) % (N)
// Ensure structure has specific size (useful for arrays)
#define CHECK_SIZE(type, size) _Static_assert(sizeof(type) == size, \
#type " must be exactly " #size " bytes")
// Ensure structure has specific alignment
#define CHECK_ALIGNMENT(type, align) _Static_assert(alignof(type) == align, \
#type " must have " #align "-byte alignment")
int main() {
int x;
char buffer[100];
printf("x address: %p\n", (void*)&x);
printf("Is 4-byte aligned? %s\n",
IS_ALIGNED(&x, 4) ? "yes" : "no");
printf("\nBuffer address: %p\n", (void*)buffer);
printf("Is 8-byte aligned? %s\n",
IS_ALIGNED(buffer, 8) ? "yes" : "no");
uintptr_t addr = (uintptr_t)buffer;
printf("ALIGN_UP to 16: %p\n", (void*)ALIGN_UP(addr, 16));
printf("ALIGN_DOWN to 16: %p\n", (void*)ALIGN_DOWN(addr, 16));
printf("Padding needed: %zu bytes\n", PADDING(addr, 16));
return 0;
}
Performance Impact of Misalignment
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define ARRAY_SIZE 1000000
#define ITERATIONS 100
int main() {
char *buffer = malloc(ARRAY_SIZE * sizeof(int) + 15);
int *aligned;
int *misaligned;
// Create pointers with different alignments
for (int offset = 0; offset < 8; offset++) {
misaligned = (int*)(buffer + offset);
// Check if this offset creates a misaligned pointer
if ((uintptr_t)misaligned % 4 != 0) {
printf("\nTesting offset %d (misaligned by %zu bytes):\n",
offset, (uintptr_t)misaligned % 4);
// Initialize
for (int i = 0; i < ARRAY_SIZE; i++) {
misaligned[i] = i;
}
// Benchmark
clock_t start = clock();
int sum = 0;
for (int iter = 0; iter < ITERATIONS; iter++) {
for (int i = 0; i < ARRAY_SIZE; i++) {
sum += misaligned[i];
}
}
clock_t end = clock();
double time = ((double)(end - start)) / CLOCKS_PER_SEC;
printf(" Time: %.3f seconds\n", time);
}
}
// Aligned access for comparison
aligned = (int*)(((uintptr_t)buffer + 15) & ~7);
printf("\nAligned access:\n");
for (int i = 0; i < ARRAY_SIZE; i++) {
aligned[i] = i;
}
clock_t start = clock();
int sum = 0;
for (int iter = 0; iter < ITERATIONS; iter++) {
for (int i = 0; i < ARRAY_SIZE; i++) {
sum += aligned[i];
}
}
clock_t end = clock();
double time = ((double)(end - start)) / CLOCKS_PER_SEC;
printf(" Time: %.3f seconds\n", time);
free(buffer);
return 0;
}
Common Alignment Pitfalls
// Pitfall 1: Assuming sizeof(struct) equals sum of member sizes
struct Problematic {
char c;
int i;
short s;
};
// sizeof is likely 12, not 1+4+2 = 7
// Pitfall 2: Casting to types with stricter alignment
char buffer[8];
int *bad_ptr = (int*)&buffer[1]; // Misaligned!
*bad_ptr = 42; // May crash
// Pitfall 3: Assuming compiler will pack structures
struct NetworkPacket {
uint8_t type;
uint32_t length; // Will be padded to offset 4
uint16_t flags;
}; // Not suitable for network transmission without packing
// Pitfall 4: Forgetting about alignment in arrays
struct alignas(16) Vector { float x, y, z, w; };
Vector vectors[10]; // Each element is 16-byte aligned
// vectors[1] starts at vectors + 16, not vectors + sizeof(Vector)
// Pitfall 5: Platform-specific alignment differences
struct Portable {
int i;
long l; // Size varies between platforms
}; // Layout differs on 32-bit vs 64-bit
Best Practices
- Order structure members by size (largest to smallest) to minimize padding
- Use
offsetofto verify member positions when needed - Consider using
alignasfor performance-critical data - Use
memcpyfor type punning instead of casting pointers - Be aware of platform differences in alignment requirements
- Use static assertions to verify assumptions
- Document alignment requirements in your code
- Use packed structures only when necessary (they hurt performance)
Conclusion
Memory alignment is a fundamental concept in C programming that affects correctness, performance, and portability. Key takeaways:
- Alignment requirements vary by type and platform
- Compilers add padding to structures to satisfy alignment
- Misaligned access can be slower or crash on some architectures
- Control alignment using
alignas, compiler attributes, or pragmas - Network protocols and hardware interfaces often require packed structures
- SIMD and atomic operations have strict alignment requirements
- Cache line alignment can significantly impact performance in multi-threaded code
Understanding and properly managing memory alignment is essential for:
- Systems programming and embedded development
- High-performance computing
- Hardware interface design
- Network protocol implementation
- Cross-platform development
While the compiler handles most alignment issues automatically, knowing when and how to control alignment gives you the power to write more efficient, portable, and correct C code.