If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
Stream FFT and iFFT
Stream FFT and iFFT
It's just a delicate little baby at the moment, so treat it with care...
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: Stream FFT and iFFT
First what I of course did - was to change that value in order to see the crash. But - it did not crashed. It looks that this value is changable when the sound is offline.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: Stream FFT and iFFT
Yes, that's right - it's only when the assembly blocks are running that there can be problems.
Because it is such an intensive algorithm, I left out the parts that check whether memory addresses are valid inside the code arrays - if they go out of range, there's an instant crash. Saves a fair chunk of CPU, but means that there are many hard-wired values that need to be right.
Ultimately, I'll probably just release a set of modules, each preset to a different buffer length - there's a big trade off with this job between versatility and CPU load due to the limitations of the FS assembly instruction set. There's still a lot more code to add yet too - FFT -> iFFT on its own is pretty dull; messing with the data in between is where things will get much more interesting.
Because it is such an intensive algorithm, I left out the parts that check whether memory addresses are valid inside the code arrays - if they go out of range, there's an instant crash. Saves a fair chunk of CPU, but means that there are many hard-wired values that need to be right.
Ultimately, I'll probably just release a set of modules, each preset to a different buffer length - there's a big trade off with this job between versatility and CPU load due to the limitations of the FS assembly instruction set. There's still a lot more code to add yet too - FFT -> iFFT on its own is pretty dull; messing with the data in between is where things will get much more interesting.
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: Stream FFT and iFFT
Last edited by tester on Wed Jun 19, 2013 10:39 am, edited 2 times in total.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: Stream FFT and iFFT
The problem with realtime FFT and iFFT is, you have to overlap the buffers and apply the correct window to the FFT. And finally you have to find the right spot where you start the overlapping FFT. That's actually a very hard task.
Have you done the conversion to SSE from x86 code? It's very hard to read, but it looks like you don't really use the benefits of SSE.
When I understand it correctly, you buffer the 1024 samples linear into "inArray" and at the end of the buffer you calculate the FFT. Wouldn't it be faster to buffer the incoming samples at the bitshifted positions in an Array? Then you could leave this step (Loop1) out of the main code.
Have you done the conversion to SSE from x86 code? It's very hard to read, but it looks like you don't really use the benefits of SSE.
When I understand it correctly, you buffer the 1024 samples linear into "inArray" and at the end of the buffer you calculate the FFT. Wouldn't it be faster to buffer the incoming samples at the bitshifted positions in an Array? Then you could leave this step (Loop1) out of the main code.
-
MyCo - Posts: 718
- Joined: Tue Jul 13, 2010 12:33 pm
- Location: Germany
Re: Stream FFT and iFFT
I think I've found a bug in the FFT code. On line 125:
xmm3 wasn't set before
- Code: Select all
movaps xmm1,xmm3;
xmm3 wasn't set before
-
MyCo - Posts: 718
- Joined: Tue Jul 13, 2010 12:33 pm
- Location: Germany
Re: Stream FFT and iFFT
Another related bug:
the Jump is always executed because 1-0 is not zero
- Code: Select all
mov eax,1;
cmp eax,0;
jnz JgteMtest;
the Jump is always executed because 1-0 is not zero
-
MyCo - Posts: 718
- Joined: Tue Jul 13, 2010 12:33 pm
- Location: Germany
Re: Stream FFT and iFFT
@Trog:
your implementation seems to use this algorithm.
your implementation seems to use this algorithm.
- Code: Select all
class Array
# DFT and inverse.
#
# Algorithm from
# 'Meyberg, Vachenauer: Hoehere Mathematik II, Springer Berlin, 1991, page 332'
#
# See http://blog.mro.name/2011/04/simple-ruby-fast-fourier-transform/
#
def fft doinverse = true
src = self
# raise ArgumentError.new "Expected array input but was '#{src.class}'" unless src.kind_of? Array
n = src.length
nlog2 = Math.log( n ) / Math.log( 2 )
raise ArgumentError.new "Input array size must be a power of two but was '#{n}'" unless nlog2.floor - nlog2 == 0.0
n2 = n / 2
phi = Math::PI / n2
if doinverse
phi = -phi
else
src.collect!{|c| c /= n.to_f}
end
# build a sine/cosine table
wt = Array.new n2
wt.each_index { |i| wt[i] = Complex(Math.cos(i * phi), Math.sin(i * phi)) }
# bit reordering
n1 = n - 1
j = 0
1.upto(n1) do |i|
m = n2
while j >= m
j -= m
m /= 2
end
j += m
src[i],src[j] = src[j],src[i] if j > i
end
# 1d(N) Ueberlagerungen
mm = 1
begin
m = mm
mm *= 2
0.upto(m - 1) do |k|
w = wt[ k * n2 ]
k.step(n1, mm) do |i|
j = i + m
src[j] = src[i] - (temp = w * src[j])
src[i] += temp
end
end
n2 /= 2
end while mm != n
src
end
end
//usage
real = []
imag = []
@in.fft(false).each {|i| real.push i.real; imag.push i.imag}
output 0, real; output 1, imag
Last edited by digitalwhitebyte on Wed Jun 19, 2013 10:28 am, edited 1 time in total.
-
digitalwhitebyte - Posts: 106
- Joined: Sat Jul 31, 2010 10:20 am
Re: Stream FFT and iFFT
Hi Guys
Saving that for later - I'll look into that once the other coding is done, but I thought I'd keep the other SSE channels free for now to give the option of parallel (e.g. stereo) operation.
Bugs! - nope!
There are a few unconditional jumps (why no 'jmp' opcode!) - they are just used to get the execution order right to switch between testing conditions at loop start/end; so the xmm3 value is actually set later in the code.
Yes, I have tried several methods of doing that - but oddly enough, they've all come out with higher CPU load so far, despite, as you say, less read/writes and way fewer opcodes. Seems that streamlining the memory pipeline is the crucial thing here - a handful less cache misses per iteration makes a huge difference.
I'll keep working on it though - that transfer is where scaling/windowing would be applied, so it needs tightening up!
Yup, that's the one!
MyCo wrote: you don't really use the benefits of SSE.
Saving that for later - I'll look into that once the other coding is done, but I thought I'd keep the other SSE channels free for now to give the option of parallel (e.g. stereo) operation.
Bugs! - nope!
There are a few unconditional jumps (why no 'jmp' opcode!) - they are just used to get the execution order right to switch between testing conditions at loop start/end; so the xmm3 value is actually set later in the code.
MyCo wrote:buffer the incoming samples at the bitshifted positions in an Array? Then you could leave this step (Loop1) out of the main code.
Yes, I have tried several methods of doing that - but oddly enough, they've all come out with higher CPU load so far, despite, as you say, less read/writes and way fewer opcodes. Seems that streamlining the memory pipeline is the crucial thing here - a handful less cache misses per iteration makes a huge difference.
I'll keep working on it though - that transfer is where scaling/windowing would be applied, so it needs tightening up!
digitalwhitebyte wrote:your implementation seems to use this algorithm.
Yup, that's the one!
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Who is online
Users browsing this forum: No registered users and 7 guests