Branch Predictors and You

Branch Predictors and You

Order your data structures

First hashnode post but I'll make it nice.

A few months ago in Advent of Code Python 2020 (yes 2020), I realized something: computer architectures prefer ordered data structures in a lot of cases.

import itertools
import timeit
import random

NUM_ITERATIONS = 100000
NUM_ENTRIES_IN_CONTROL = 2 ** 12

print(f"with {NUM_ITERATIONS} iterations")
print(f"with {NUM_ENTRIES_IN_CONTROL} control group size")

# The unsorted control group.
control = [i for i in range(NUM_ENTRIES_IN_CONTROL)]
random.shuffle(control)

# This is kind of slow
print("=== CONTROL GROUP ===")
print(f"takes {timeit.timeit(lambda: itertools.permutations(control), number=NUM_ITERATIONS)} secs")

# A revelation comes when you sort it
revelation = sorted(control)
print("=== MODIFIED GROUP ===")
print(f"takes {timeit.timeit(lambda: itertools.permutations(revelation), number=NUM_ITERATIONS)} secs")

Apparently there is a reason for this: CPUs implement branch prediction and have implemented that technology for a long time, the technology is just used to sorted entries in a list.

The Pentium Silver N5000 in my laptop implements it, and so does my workstation. Maybe your phone implements branch prediction too, which is quite amazing. (note that a branch predictor has been used in small Intel CPUs beginning with Silvermont, for a while)

You can look into this on this StackOverflow entry.

Real proof is in the numbers, let's see how it runs on the laptop:

with 100000 iterations
with 4096 control group size
=== CONTROL GROUP ===
takes 4.334974428998976 secs
=== MODIFIED GROUP ===
takes 3.404549691000284 secs

I like that improvement.

I think people will find use in sorting values when performing heavy calculations.