Stream FFT and iFFT

by **digitalwhitebyte** » Fri Jun 21, 2013 2:55 am

i have misured without the graph and the three de-serializer, no output connected to external driver
asio driver samplerate 96000
i7 920

value from internal cpu meter
v1 ~8.9%
v2 ~8.3%
v3 ~11.7%

value from Resource Monitor tools in W7
Average Cycle
v1 ~2.23
v2 ~2.45
v3 ~2.73

by **MyCo** » Fri Jun 21, 2013 4:03 am

That's really weird. Maybe it's just instruction latency that has more impact on Intels than on AMDs. That means, it could be fixed by rearranging the code a little bit, without changing it's function. I can't test this, maybe someone else can try this.

I've attached my latest version. It is now fully flexible. The FFT size can be changed while the whole thing is running. That required some minor changes in the code of the FFT (just rearrangement). But I had to change some of the other modules to make this work, too. I completely rebuild the serializer, so that it uses a double buffer (one is written to, the other is read from). You can't compare the CPU usage with the previous versions because of this and there is another signal source (3 Oscs) for testing.

On my machine I can go up to 32768 Points without crashing or lagging, although there is a hugh delay (~1.5 seconds @ 44kHz). The 32768 Points maximum in the dropdown is also the maximum of the buffers in the code, so don't go beyond that.

by **tester** » Fri Jun 21, 2013 11:42 am

Great work.

by **MyCo** » Fri Jun 21, 2013 5:44 pm

I found something weird. After changing almost any code in the project, I had to change the Integer counter (so that it outputs 2 signals). After that change I noticed a huge performance boost, and I can't explain why. My code is even longer than the one from trog, but on my machine it uses only 1/250 as much CPU.

I've attached a comparison, maybe someone finds the reason for the huge difference.

by **tester** » Fri Jun 21, 2013 7:15 pm

B to A ~ 196 to 16 here (C2D).

I'm not an ASM geek, but I would think this. Either in Trogs example there is calculated something more (directly or in background), or there is some queue or value that (silently or direct) waits until some other operations are done. Or some CPU element is used, that in that particular design creates such slowdown.

So I would split it into conceptual blocks and test the blocks only; that would tell me whether some of these blocks is doing this, or combination of them.

Analyzer is fine (I switched outputs connected to it and reset the analyzer - note for those who didn't).

by **trogluddite** » Fri Jun 21, 2013 7:21 pm

Ha ha - because my code is WRONG, yet still WORKS!!

The culprit seems to be...

Code: Select all: cvtps2dq xmm1,xmm1; addps xmm1,current;

...so after converting xmm1 to integer, I'm doing a float add - D'oh :oops:

As both numbers are <23 bits in length (as integers), they will have exponent = 0 when treated as floats, and so are denormals, hence the huge CPU load when added as floats. The opcode should, of course, have been "paddd", what a dumb-ass!
But, since both numbers have exactly the same exponent of zero, as will the answer, the integer output still comes out correctly - as there was no 'bug' apparent in this tiny "utility" code, I never looked over the code again, and didn't see the stupid mistake!!

Lesson of the story - there is no such thing as a "trivial" code routine!!

by **RJHollins** » Fri Jun 21, 2013 7:41 pm

hehe,

None of your work here has been trivial :lol:

Thanks to TROG and other esteem GURU's !

This is a wonderful learning experience for me ! 8-)

by **TheAudiophileDutchman** » Fri Jun 21, 2013 9:13 pm

MyCo wrote:I've attached my latest version. It is now fully flexible. The FFT size can be changed while the whole thing is running.

WOW, MyCo and Trog this is really amazing stuff you guys have going on here! :!:

(just one minor niggle: amplitude plots in this latest version appear to be asymmetrical, while previous versions were okay)

by **MyCo** » Fri Jun 21, 2013 9:18 pm

trogluddite wrote:The culprit seems to be...
Code: Select all
cvtps2dq xmm1,xmm1; addps xmm1,current;

Haven't seen that... That explains the performance difference. The int value interpreted as float is just a denormal. That's why the calculation can still output the right value.

by **MyCo** » Fri Jun 21, 2013 9:21 pm

TheAudiophileDutchman wrote:just one minor niggle: amplitude plots in this latest version appear to be asymmetrical, while previous versions were okay

hm... interresting. Maybe it is just the graph display, that doesn't interpolate the plot points.

Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Re: Stream FFT and iFFT

Who is online