Convolution Reverb

by **trogluddite** » Thu Oct 24, 2013 10:16 pm

KG_is_back wrote:It seems, the Trog's array to memory is not working.

Well, I did say it was experimental!

But there is something you have missed - the Ruby test example is hard coded to an array size of 10 items. (the @size variable in the Ruby code). My fault, I should have explained better in the previous post - it's a very rough prototype that can't be treated as a "cut and paste" module just yet.
In fact, the code goes to great lengths to ensure that the size of the frame cannot be changed. This is by design - I am trying to eliminate one factor at a time as I progress trying to prove that it is reliable. Changing the array size from outside the code, by reading the size of the green array, could require the Frame object to be re-constructed, which would change the start address while the audio is still running. At the moment, I am trying to avoid this from happening, because it needs a lot more testing to be certain that it won't make it unstable (i.e. ASN crashes!).

No criticism intended, you could well be right that this technique will never work reliably - but the hard coded frame size means that your schematic will definitely fail even if my theory is correct. Possibly, I just posted my idea a little too soon - it was just such a co-incidence that you posted about the subject on the same evening that I baked my schematic that I couldn't resist!

by **KG_is_back** » Thu Oct 24, 2013 10:39 pm

Yes, I've noticed that the array size must be fixed... nevermind, I have to go the old way of loading at streamrate...

by **KG_is_back** » Sun Oct 27, 2013 10:26 pm

Gentlemen.... finally a version that outputs data similar to what they should look like.

http://www.mediafire.com/download/eou8v ... reverb.fsm

by **KG_is_back** » Wed Oct 30, 2013 3:34 pm

I've attempted to do the convolution in a single asm module. And guess what... it's crashing... However the algorithm is much simpler than in my previous approach, so it might be easier to debug. also if it works the delay will be only 64samples. The outputs of the module should be first 64samples in first and second 64samples in second... the second will be fed to 64sample delay.

by **KG_is_back** » Tue Nov 05, 2013 9:34 pm

I have successfully rewritten FFT and iFFT into single ASM-block. Took me centuries to figure out how it deals with input and output. making the FFT doublesize is no problem - simply load input to first half of input array and zero to a second half at the same time.

by **trogluddite** » Wed Nov 06, 2013 1:42 am

Nice work KG - another big step of the way done.
CPU load is high - 50% to 80% of one CPU core.

He he - just kidding, that's on my Atom N260 CPU netbook!

So that''s a pretty good figure, I'll check it out on my big machine when I get a moment tomorrow.

by **KG_is_back** » Fri Nov 08, 2013 5:39 pm

I have a problem! My complex multiplication algorithm seems to be wrong... Without it it works flawlessly as a FFT->iFFT and I know the impulse is loaded good. So it must be something wrong with the multiplication itself (found roughly in the middle of the code named "multiplication of the spectra"). Also the patterns that the wave has when unit impulse is not at the start of the FFT window (when is, FFT contains only real parts and the output is as expected, so the problem must lie in the imaginary parts) resemble those which I've seen when I was playing with iFFT for additive osc.

pls, someone with complex-number-math-skills, help me!

by **martinvicanek** » Sat Nov 09, 2013 3:43 pm

The math part is quite simple: the product of two complex numbers, z = x + i*y and Z = X + i*Y, say, is just

z*Z = x*X - y*Y + i*(x*Y + y*X).

When looking at the relevant code block

Code: Select all: /////////////////////////////////////////////////////////// // multiplication of spectra /////////////////////////////////////////////////////////// mov eax,intFFTSize[0]; shl eax,3; convoloop: sub eax,16; movaps xmm7,floatZERO; movaps xmm0,inArray[eax]; //movaps xmm6,MemoryArray[eax]; //movaps MemoryArray[eax],xmm0; //shl eax,1; movaps xmm1,ImpulseArray[eax]; ///movaps xmm1,floatZERO; movaps xmm2,xmm0; shufps xmm0,xmm0,160;///RRRR mulps xmm0,xmm1; //R*r // R*i shufps xmm2,xmm2,245; //IIII shufps xmm1,xmm1,177; //irir mulps xmm2,xmm1; // I*i // I*r mulps xmm2,complexplusminus; //-I*i //I*r subps xmm0,xmm2;//R*r-I*i // R*i+I*r addps xmm7,xmm0; //add to acumulator; movaps inArray[eax],xmm7; jg convoloop;

I have a few observations:

1. The complexminus variable is not declared (dunno if this is necessary)

2. Only two of the four SSE channels are assigned a value (-1):

Code: Select all: mov eax,floatMINUSONE[0]; mov complexplusminus[0],eax; mov complexplusminus[2],eax;

Should the other two channels 1 and 3 not be assigned the value +1?

3. In this code line

Code: Select all: subps xmm0,xmm2;//R*r-I*i // R*i+I*r

should it not actually read addps in order to match with the comment? (I admit I am only guessing)

by **KG_is_back** » Sat Nov 09, 2013 5:13 pm

martinvicanek wrote:The math part is quite simple: the product of two complex numbers, z = x + i*y and Z = X + i*Y, say, is just

z*Z = x*X - y*Y + i*(x*Y + y*X).

When looking at the relevant code block

Code: Select all
/////////////////////////////////////////////////////////// // multiplication of spectra /////////////////////////////////////////////////////////// mov eax,intFFTSize[0]; shl eax,3; convoloop: sub eax,16; movaps xmm7,floatZERO; movaps xmm0,inArray[eax]; //movaps xmm6,MemoryArray[eax]; //movaps MemoryArray[eax],xmm0; //shl eax,1; movaps xmm1,ImpulseArray[eax]; ///movaps xmm1,floatZERO; movaps xmm2,xmm0; shufps xmm0,xmm0,160;///RRRR mulps xmm0,xmm1; //R*r // R*i shufps xmm2,xmm2,245; //IIII shufps xmm1,xmm1,177; //irir mulps xmm2,xmm1; // I*i // I*r mulps xmm2,complexplusminus; //-I*i //I*r subps xmm0,xmm2;//R*r-I*i // R*i+I*r addps xmm7,xmm0; //add to acumulator; movaps inArray[eax],xmm7; jg convoloop;
I have a few observations:

1. The complexminus variable is not declared (dunno if this is necessary)

2. Only two of the four SSE channels are assigned a value (-1):
Code: Select all
mov eax,floatMINUSONE[0]; mov complexplusminus[0],eax; mov complexplusminus[2],eax;
Should the other two channels 1 and 3 not be assigned the value +1?

3. In this code line
Code: Select all
subps xmm0,xmm2;//R*r-I*i // R*i+I*r
should it not actually read addps in order to match with the comment? (I admit I am only guessing)

Martin, I can't thank you enough

It is all working now. The complexplusminus should have been declared as 1, but I copied the multiplication from one of my previous attempts and forgot to declare it here too. And you were right about the andps too.

Here is the fixed version. I will add a version that supports impulses longer than 1-fft chunk later. I basically will just add loop, that will load the FFT into a memory and will multiply multiple FFT-frames with coresponding parts of the impulse in one loop and then add them together and send to iFFT. I assume addition in frequency domain works normally.

by **KG_is_back** » Sat Nov 09, 2013 11:47 pm

Finally a working prototype!!! I feel like I've climbed everest... It works like this:
first it preprocesses (offline - in one shot) the impulse. it cuts off the first half-FFT window (the FFT size is twice as big as a window which is processed) and preforms the FFTs. Then send the "processed" impulse to the main FFT unit.
There the stream gets processed. Half of the input buffer gets loaded and FFT is preformed (leaving the second half zeroes). The FFT gets stored into memory buffer and the buffer (containing current and previous FFTs) gets multiplied by the "processed impulse" which is basically also a sequence of FFTs. Values at the same frequency bins get summed and are send to input of iFFT. the iFFT outputs convoluted wave. First half of the wave goes directly to output, second is delayed by half-fft-size (because FFTs are preformed twice as fast as the FFTsize is) and also send to output. The whole Process has latency of half FFT-size, therefore first samples (those which were initially cut off in Impulse preprocessing) are computed via time-domain FIR filter which is cpu heavy, but latency-free.

There is still space for optimization - especially in the Array shuffling.

Working Memin would also be really cool - it would led the impulse (or possibly an array of impulses) to be loaded instantly. that would allow realtime glitch less impulse change (currently you have to reset audio to load new impulse), dynamic convolution (convolution where the impulse is level dependent), etc...
Hopefully trog will make that array to mem working - I wish him luck

Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Re: Convolution Reverb

Who is online