Software

3 minute read

Multi-dimensional Arrays & Row-major Order: A Deep Dive

January 12, 2026

Let me explain multi-dimensional arrays and row-major ordering from the ground up, covering memory layout, addressing, and low-level implementation details.

1. Understanding Multi-dimensional Arrays

Conceptual vs Physical Reality

Conceptually, when you declare:

int matrix[3][4];

You think of a 2D grid:

[0][0]  [0][1]  [0][2]  [0][3]
[1][0]  [1][1]  [1][2]  [1][3]
[2][0]  [2][1]  [2][2]  [2][3]

Physically, memory is linear (one-dimensional). The CPU can only access memory sequentially through addresses. Multi-dimensional arrays are an abstraction that the compiler maps to linear memory.

2. Row-major Order Explained

Row-major order is the strategy for mapping multi-dimensional indices to a linear memory address. Elements are stored row by row.

Memory Layout Example

For int matrix[3][4] (3 rows, 4 columns):

Memory Address:  Value:        Logical Position:
0x1000          matrix[0][0]   
0x1004          matrix[0][1]   ← Row 0 stored contiguously
0x1008          matrix[0][2]
0x100C          matrix[0][3]
0x1010          matrix[1][0]   ← Then Row 1
0x1014          matrix[1][1]
0x1018          matrix[1][2]
0x101C          matrix[1][3]
0x1020          matrix[2][0]   ← Finally Row 2
0x1024          matrix[2][1]
0x1028          matrix[2][2]
0x102C          matrix[2][3]

Notice: Rightmost index changes fastest when traversing memory sequentially.

3. Address Calculation Formula

For 2D Arrays

Given: type array[ROWS][COLS]

To access array[i][j]:

address = base_address + (i * COLS + j) * sizeof(type)

Breakdown:

i * COLS: Skip i complete rows (each row has COLS elements)
+ j: Move j positions within the current row
* sizeof(type): Scale by element size (bytes)

Example Calculation

For int matrix[3][4] at base address 0x1000, accessing matrix[2][3]:

address = 0x1000 + (2 * 4 + 3) * 4
        = 0x1000 + (8 + 3) * 4
        = 0x1000 + 11 * 4
        = 0x1000 + 44
        = 0x102C

For 3D Arrays

Given: type array[D1][D2][D3]

To access array[i][j][k]:

address = base + (i * D2 * D3 + j * D3 + k) * sizeof(type)

Logic:

i * D2 * D3: Skip i complete “planes” (each plane is D2×D3 elements)
j * D3: Skip j complete rows within the current plane
k: Position within the current row

For N-dimensional Arrays

General formula for array[i₁][i₂][i₃]...[iₙ] with dimensions [D₁][D₂][D₃]...[Dₙ]:

offset = i₁ * (D₂ * D₃ * ... * Dₙ) +
         i₂ * (D₃ * D₄ * ... * Dₙ) +
         i₃ * (D₄ * D₅ * ... * Dₙ) +
         ...
         iₙ₋₁ * Dₙ +
         iₙ

address = base + offset * sizeof(type)

4. Compiler Implementation Details

How C/C++ Handles Multi-dimensional Arrays

When you write:

int arr[3][4][5];
int x = arr[1][2][3];

The compiler translates this to:

int x = *(arr + (1 * 4 * 5 + 2 * 5 + 3));

Pointer Arithmetic

Multi-dimensional arrays decay to pointers in interesting ways:

int arr[3][4];

arr has type int (*)[4] (pointer to array of 4 ints)
arr[i] has type int * (pointer to int)
arr[i][j] has type int (actual value)

What happens with arr + 1?

It advances by the size of one complete row (4 ints = 16 bytes if int is 4 bytes)
Points to arr[1][0]

What happens with *(arr + 1) + 2?

*(arr + 1) gives you arr[1] (a pointer to the first element of row 1)
Adding 2 advances by 2 ints
Points to arr[1][2]

Assembly Level (x86-64 Example)

For arr[i][j] where arr is int[ROWS][COLS]:

; Assume: base address in RAX, i in RBX, j in RCX

mov    rdx, rbx           ; rdx = i
imul   rdx, COLS          ; rdx = i * COLS
add    rdx, rcx           ; rdx = i * COLS + j
shl    rdx, 2             ; rdx = (i * COLS + j) * 4 (multiply by sizeof(int))
add    rdx, rax           ; rdx = base + offset
mov    eax, [rdx]         ; load the value

Modern compilers optimize this heavily using LEA (Load Effective Address):

lea    rdx, [rbx + rbx*4] ; rdx = i * 5 (if COLS=5)
lea    rdx, [rdx*4 + rax] ; combine scaling and base
mov    eax, [rdx + rcx*4] ; load arr[i][j]

5. Row-major vs Column-major Order

Row-major (C, C++, Java, Python NumPy default)

For arr[3][4]:
Storage order: [0,0] [0,1] [0,2] [0,3] [1,0] [1,1] ...

Column-major (Fortran, MATLAB, R, Julia)

For arr[3][4]:
Storage order: [0,0] [1,0] [2,0] [0,1] [1,1] [2,1] ...

Column-major formula for array[i][j] in dimensions [ROWS][COLS]:

address = base + (j * ROWS + i) * sizeof(type)

Performance Implications

Cache locality matters! Modern CPUs fetch memory in cache lines (typically 64 bytes).

Row-major traversal (optimal in C):

for (int i = 0; i < ROWS; i++) {
    for (int j = 0; j < COLS; j++) {
        process(arr[i][j]);  // Sequential memory access
    }
}

Column-major traversal (suboptimal in C):

for (int j = 0; j < COLS; j++) {
    for (int i = 0; i < ROWS; i++) {
        process(arr[i][j]);  // Strided access, cache misses
    }
}

6. Memory Layout Deep Dive

Alignment and Padding

Arrays are typically aligned to their element’s alignment requirement:

struct Data {
    char c;      // 1 byte
    int arr[2][3]; // Starts at offset 4 (aligned to 4-byte boundary)
};

Array of Arrays vs Array of Pointers

True 2D array:

int arr[3][4];  // Contiguous 48 bytes (3*4*4)

Array of pointers (simulated 2D):

int *arr[3];
arr[0] = malloc(4 * sizeof(int));
arr[1] = malloc(4 * sizeof(int));
arr[2] = malloc(4 * sizeof(int));

These look similar but have completely different memory layouts:

True 2D: One contiguous block
Pointer array: Non-contiguous, each row can be anywhere in memory
True 2D: Faster, better cache locality
Pointer array: Flexible (rows can have different sizes), fragmented

7. Practical Example with Memory Dump

#include 

int main() {
    int arr[2][3] = {
        {1, 2, 3},
        {4, 5, 6}
    };

    printf("Base address: %pn", (void*)arr);

    for (int i = 0; i < 2; i++) {
        for (int j = 0; j < 3; j++) {
            printf("arr[%d][%d] = %d at address %pn", 
                   i, j, arr[i][j], (void*)&arr[i][j]);
        }
    }

    // Treating as 1D array
    int *flat = (int*)arr;
    printf("nAs flat array:n");
    for (int k = 0; k < 6; k++) {
        printf("flat[%d] = %d at address %pn", 
               k, flat[k], (void*)&flat[k]);
    }

    return 0;
}

Output (example):

Base address: 0x7ffc8b3e1a40
arr[0][0] = 1 at address 0x7ffc8b3e1a40
arr[0][1] = 2 at address 0x7ffc8b3e1a44
arr[0][2] = 3 at address 0x7ffc8b3e1a48
arr[1][0] = 4 at address 0x7ffc8b3e1a4c
arr[1][1] = 5 at address 0x7ffc8b3e1a50
arr[1][2] = 6 at address 0x7ffc8b3e1a54

Notice addresses increment by 4 (sizeof(int)) sequentially in row-major order.

8. Dynamic Multi-dimensional Arrays

Method 1: Single malloc with manual indexing

int rows = 3, cols = 4;
int *arr = malloc(rows * cols * sizeof(int));

// Access arr[i][j]:
int value = arr[i * cols + j];

Method 2: Array of pointers

int **arr = malloc(rows * sizeof(int*));
for (int i = 0; i < rows; i++) {
    arr[i] = malloc(cols * sizeof(int));
}

// Access naturally: arr[i][j]

Method 3: Contiguous with pointer array (best of both)

int **arr = malloc(rows * sizeof(int*));
arr[0] = malloc(rows * cols * sizeof(int));
for (int i = 1; i < rows; i++) {
    arr[i] = arr[0] + i * cols;
}

// Cleanup:
free(arr[0]);
free(arr);

9. Hardware Considerations

CPU Cache Behavior

Modern CPUs have hierarchical caches (L1, L2, L3). Sequential access patterns maximize cache hit rates.

Cache line example (64 bytes):

If int is 4 bytes, one cache line holds 16 ints.
Accessing arr[0][0] loads arr[0][0] through arr[0][15] into cache.

TLB (Translation Lookaside Buffer)

The TLB caches virtual-to-physical address translations. Strided access patterns can cause TLB thrashing.

Prefetching

Modern CPUs detect sequential access patterns and prefetch upcoming cache lines. Row-major traversal in row-major storage enables this optimization.

10. Summary

Key Takeaways:

Multi-dimensional arrays are linearized in memory
Row-major order: rightmost index varies fastest
Address formula: base + (i * COLS + j) * sizeof(type) for 2D
Traversal matters: iterate in storage order for performance
True arrays vs pointer arrays: different memory layouts, different trade-offs
Compiler magic: translates arr[i][j] to pointer arithmetic automatically
Cache locality: sequential access patterns are 10-100x faster than random access

Understanding these low-level details helps you write faster code, debug memory issues, and interface with hardware or other languages effectively.

Bank of America’s 4% Recommendation: When Wall Street’s Conservatives Raise the Crypto White Flag

January 12, 2026

Software

How to Build SEO-Friendly Ecommerce Product Pages

January 12, 2026

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Multi-dimensional Arrays & Row-major Order: A Deep Dive

1. Understanding Multi-dimensional Arrays

Conceptual vs Physical Reality

2. Row-major Order Explained

Memory Layout Example

3. Address Calculation Formula

For 2D Arrays

Example Calculation

For 3D Arrays

For N-dimensional Arrays

4. Compiler Implementation Details

How C/C++ Handles Multi-dimensional Arrays

Pointer Arithmetic

Assembly Level (x86-64 Example)

5. Row-major vs Column-major Order

Row-major (C, C++, Java, Python NumPy default)

Column-major (Fortran, MATLAB, R, Julia)

Performance Implications

6. Memory Layout Deep Dive

Alignment and Padding

Array of Arrays vs Array of Pointers

7. Practical Example with Memory Dump

8. Dynamic Multi-dimensional Arrays

Method 1: Single malloc with manual indexing

Method 2: Array of pointers

Method 3: Contiguous with pointer array (best of both)

9. Hardware Considerations

CPU Cache Behavior

TLB (Translation Lookaside Buffer)

Prefetching

10. Summary

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts