Cache miss - big CPU eater

by **KG_is_back** » Thu Nov 06, 2014 8:30 pm

In order to speed up RAM access, your processor has so called memory cache (also called Near Memory or CPU memory). Processor analyzes your code on a run and attempts to predict which variables will be used - loads those variables from RAM to cache (prefetches them). Reading/writing variables from/to cache is several hundred times faster than directly from RAM. Fortunately, variables in flowstone are automatically aligned in a way, that maximizes the cache efficiency.

Problem is with arrays. cache has size to load only few thousand values (samples). When you use big arrays and wave tables in your schematic, your cache cannot load them whole - your processor tries to predict which parts of the array will be used and prefetches them. When you attempt to load value that wasn't prefetched there is massive CPU penalty for reading it form main RAM.

This schematic illustrates that... The code component has a part that reads from an array (the array in the example is empty, but your processor doesn't know that). You have an option to switch between two different index calculations - first one is a regular ramp (a very predictable pattern) while second one is a random number generator (a very unpredictable pattern). While with the ramp the processor can easily predict and prefetch the right memory segment with random indexes it fails to do that. Result is that for example on my machine the random indexing takes TWICE as much CPU as the regular ramp.

Note that by switching nothing is changed in the code - nothing gets bypassed = the code runs in the very same way - only thing that changes is the pattern at which the index changes.
Also the schematic uses Code Speed Tester - have a look at its description to use it correctly.

It is certainly another thing to consider when optimizing your code.

Cache miss - big CPU eater

Cache miss - big CPU eater

Who is online