Wrap around by bit masking

by **HughBanton** » Fri Jan 15, 2021 12:13 pm

Frustration after endless googling, and not finding the answer I need (or maybe I'm just being layzeee ..!)

I have stored waves of length 1024, or 512, or 128, or 64, and I have index counters. If I apply phase modulation to the index counters (aka 'FM') the index can then overrun or underrun, and to date I've been using ASM code derived from the % (modulo) function to sort out the required wrap arounds and restrain the number range. No problem as such, works OK.

But it has recently dawned on me that since my wavelengths are always these neat binary numbers I could probably just use a bit mask to AND with the counters and get the same result, by simply masking the higher bits, no?

However (pseudo code) ...
movaps xmm0, counter;
andps xmm0, waveLength;
... obviously doesn't work, because even I know that 128 in binary is 10000000, so is definitely not the mask I need!

How do I generate a floating point mask to remove bits higher than 128, or bits higher than 512 etc.?

Grateful for any guidance.

H

by **trogluddite** » Fri Jan 15, 2021 2:04 pm

TL/DR: For an integer bitmask, just subtract one from the power-of-two wavelength. But for floats, it isn't so easy!!

When working with integer data types...

Subtract one from the power-of-two wavelength.

It might help to think of the same thing in decimal...
If you have an exact power of ten, e.g. 10000000, and you subtract one, you get all of the lower digits maxed out to nine, 9999999.

Likewise, in binary, if you have an exact power of two, subtracting one maxes out all of the lower digits to one; e.g. 10000000 (128) - 1 = 01111111 (127).

Because code arrays have a first index (ordinal) of zero (not one!), an Array of size (2 ^ n) has indices running from zero to (2 ^ n - 1), so the mask gives you exactly what you want (e.g. a four element array has indices 0, 1, 2, & 3, so you want 4 to wrap to 0).

However, for float values it isn't so easy...

Floating point values are encoded in a way that's very similar to "scientific notation"(i.e. sign * mantissa * 2^exponent, where sign, mantissa, and exponent use different chunks of the available bits). This encoding makes it impossible to wrap the value using a bitmask. However, in Assembly (but not DSP code), you can freely convert between the integer form and floating point form. So you can do something like the following example, which will still be much faster than computing a modulus...

Code: Select all: // Wavelength minus one. int bitmask = 1023; // Assume float value is in xmm0 // First convert to integer (will round the value)... cvtps2dq xmm0, xmm0; // Now wrap... andps xmm0, bitmask; // Convert back to floating point... cvtdq2ps xmm0, xmm0;

When using the value as an array index, you usually need the integer form when calculating a memory address offset (e.g. for "fld Array[eax]"), so you can often optimise by doing the wrapping at a point in the code where you already have the integer representation. Other ASM opcodes for working with integers can help here - e.g. "paddd" (add SSE integers) and "pslld" (left bit-shift SSE integers).

by **HughBanton** » Fri Jan 15, 2021 7:13 pm

Hey yeah, I got that to work! Brilliant, thanks a million. At least two million brazillion.

My Wave-Lengths are on a streamin (they're note dependent, low notes are bigger) and I quickly realised I needed to do a cvtps2dq on their values as well otherwise, oh dear ...

As well as saving a whole block of code this has kicked out a dreaded frontline divps :evil:

. Always glad to get rid, I'll see the cpu% fall a bit again no doubt, once I've tidied up.

On 'Guru there used to be a useful list of rough cycle-lengths for each of the common functions, that KG had listed - for addps, mulps, dreaded-divps, cmps etc. etc. But I never made a note of them. You don't know of a good reference document do you?

I'll post something about my wave-reader when I've finished messing (again). Neat interpolation method (courtesy MV I think), Freq mod & Phase mod and all. But it's Alpha only atm, it uses memrefin, as well as some fancy array-read code that MyCo offered up last year.

But .. got to get me album finished man, been saying that since June :!:

....

H

by **trogluddite** » Fri Jan 15, 2021 8:48 pm

HughBanton wrote:You don't know of a good reference document do you?

Modern CPUs use so many clever hardware tricks to wring out extra performance that I haven't used CPU-cycle tables for a long time. Optimising on a "per-opcode" basis is often moot these days unless you are also careful about opcodes which can be executed concurrently ("out of order processing") and optimising CPU cache use (memory read/writes are often the most critical performance bottlenecks). It's far too easy nowadays to write code that saves CPU cycles on some machines but not others, or only in "wind tunnel tests" but not on "scheduled flights", and code optimised for the new-fangled hardware CPU trickery can get very spaghettish!

I kinda miss the old SM "CPU cycle olympics" that we used to have, but anything beyond the obvious rules of thumb ("learned at primary school = fast") needs in-situ "typical project" testing these days.

HighBanton wrote:But .. got to get me album finished man, been saying that since June ....

I'm absolutely convinced that the "P" in "DSP" stands for "procrastination"! :lol:

by **HughBanton** » Fri Jan 15, 2021 9:03 pm

trogluddite wrote:I'm absolutely convinced that the "P" in "DSP" stands for "procrastination"!

Demoralising Serial Procrastination, no doubt about it.

H :cry:

by **HughBanton** » Sun Jan 17, 2021 12:25 pm

Re cpu speed & efficiency. I guess it's not surprising that since I've replaced my previous counter wrap-around method :

counter = counter % waveLength; // for indexes greater than waveLength
counter = counter + (counter<0) & waveLength; // for indexes less than zero

with simply : (NB this one's pseudo code .. don't try this in DSP!)
counter = counter (bit)& waveLength;

.. I'm getting around 17-18% improvement. Significant! The old method equates to 13 assembler instructions, (including a compare, a multiply and a divide), whereas its replacement is basically done with 2 instructions. Big result, well happy.

The simple AND method works (Trog will correct me if this is wrong ..) thanks to negative numbers being written in 2s Complement, such that e.g. '-1', when shed of its leading bits by being AND'ed with waveLength, will automatically yield 'waveLength - 1', exactly as needed.

So .. (waveLength + 3) AND waveLength ->> 3,
and with negatives .. (- 3) AND waveLength ->> (waveLength -3). Brilliant.

I learnt something last week!

H

by **HughBanton** » Sun Jan 17, 2021 12:28 pm

Oops! .. AND (waveLength-1), in all cases. I'll write it out 100 times. :roll:

by **trogluddite** » Sun Jan 17, 2021 3:48 pm

HughBanton wrote:It works [...] thanks to negative numbers being written in 2s Complement

Or you could put it the other way around - your example is the perfect demonstration of why negative numbers are represented using 2s complement; because you get the expected equivalence between adding a negative number and subtracting a positive one.

by **adamszabo** » Mon Jan 18, 2021 6:53 pm

Hugh, would you mind posting a simple example schematic of the difference? Id love to heck it out as well

by **HughBanton** » Wed Jan 20, 2021 4:12 pm

Hi Adam; tricky to make an actual working example, especially since I've been playing exclusively in the Alpha box recently, and I've been using 'memrefin' for ages. Won't work here!

However I've put together a schematic showing my 'before & after' code for the bit I was concerned with here, with some explanation of what's going on. You'll see how the wrap-around shrinks to 3 lines.

To clarify, for the unititiated, the whole reason for needing a wrap-around here is because I have a counter (in this example) that reads the wave from 0 to 127, but because I'm adding phase modulation (as per Yamaha DX7 etc.) the counter numbers can swing way way out of bounds, both above 127 and below zero.

Which is no use for reading from a wave with exactly 128 samples! So the wrap-around corrects the counter back into range, so for example 128.123 becomes 1.123, and similarly -64.01 becomes +63.99. All good again.

Hope you don't want me to explain to everyone about the ins & outs of '2s complement' representation, ooo eck!!
Definitely one for Trog, that. :lol:

But it means we only have to AND the integer part, the fraction remains the same.

wave scan example_2.fsm: NB Alpha 64-bit only, contains unrecognised inputs etc.; (2.94 KiB) Downloaded 1274 times

H

Wrap around by bit masking

Wrap around by bit masking

Re: Wrap around by bit masking

Re: Wrap around by bit masking

Re: Wrap around by bit masking

Re: Wrap around by bit masking

Re: Wrap around by bit masking

Re: Wrap around by bit masking

Re: Wrap around by bit masking

Re: Wrap around by bit masking

Re: Wrap around by bit masking

Who is online