At BUILD 2013 there was a talk on Native Code Performance and Memory: The Elephant in the CPU, even though I didn’t attend BUILD this year I watched it on Channel9 and I’d like to recap something from it that I find interesting. In many cases I think that we ignore the fact that we need to handle performance optimizations ourselves, we can’t hide behind the CLR and the compiler all the time now, can we? Some say that Ignorance is Bliss, but in this case Ignorance might cause a lot of headache.

Let’s start by taking an example from Eric Brumer’s talk. Consider that we have the following three multi-dimensional arrays:

`int`

`N = 1000;`

```
var A =
```

`new`

```
int
```

`[N, N];`

```
var B =
```

`new`

```
int
```

`[N, N];`

```
var C =
```

`new`

```
int
```

`[N, N];`

```
var random =
```

`new`

```
Random();
```

`for`

`(`

`int`

`x = 0; x < N; x++)`

`{`

` `

`for`

`(`

`int`

`y = 0; y < N; y++)`

` `

`{`

` `

`A[x, y] = random.Next();`

` `

`B[x, y] = random.Next();`

` `

`C[x, y] = random.Next();`

` `

`}`

`}`

`for`

`(`

`int`

`i = 0; i < N; i++)`

` `

`for`

`(`

`int`

`j = 0; j < N; j++)`

` `

`for`

`(`

`int`

`k = 0; k < N; k++)`

` `

`C[i,j] += A[i,k] * B[k, j];`

**We can look at the memory access!**

**So how do we fix this?**

It’s quite simple and probably obvious. We can just swap the two inner loops so that we instead process each column in each row before we change row:

`for`

`(`

`int`

`i = 0; i < N; i++)`

` `

`for`

`(`

`int`

`k = 0; k < N; k++)`

` `

`for`

`(`

`int`

`j = 0; j < N; j++)`

` `

`C[i, j] += A[i, k] * B[k, j];`

`int`

`N = 1000;`

```
var A =
```

`new`

```
int
```

`[N,N];`

```
var B =
```

`new`

```
int
```

`[N, N];`

```
var C =
```

`new`

```
int
```

`[N, N];`

```
var random =
```

`new`

```
Random();
```

`for`

`(`

`int`

`x = 0; x < N; x++)`

`{`

` `

`for`

`(`

`int`

`y = 0; y < N; y++)`

` `

`{`

` `

`A[x, y] = random.Next();`

` `

`B[x, y] = random.Next();`

` `

`C[x, y] = random.Next();`

` `

`}`

`}`

` `

```
var watch =
```

`new`

```
Stopwatch();
```

`watch.Start();`

` `

`for`

`(`

`int`

`i = 0; i < N; i++)`

` `

`for`

`(`

`int`

`j = 0; j < N; j++)`

` `

`for`

`(`

`int`

`k = 0; k < N; k++)`

` `

`C[i,j] += A[i,k] * B[k, j];`

` `

`watch.Stop();`

` `

`Console.WriteLine(`

`"Time: {0}ms"`

`, watch.ElapsedMilliseconds);`

` `

` `

`watch.Reset();`

`watch.Start();`

` `

`// 60% Faster than the loop above!`

`for`

`(`

`int`

`i = 0; i < N; i++)`

` `

`for`

`(`

`int`

`k = 0; k < N; k++)`

` `

`for`

`(`

`int`

`j = 0; j < N; j++)`

` `

`C[i, j] += A[i, k] * B[k, j];`

` `

`watch.Stop();`

` `

`Console.WriteLine(`

`"Time: {0}ms"`

`, watch.ElapsedMilliseconds);`

This article was originally posted by Filip Ekberg on blog.filipekberg.se.