Fast Stream Array Access

Post any examples or modules that you want to share here
KG_is_back
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Post by KG_is_back »

Exo wrote:
KG_is_back wrote:I was going to ask you guys is there any opcodes you really want/need? If you can give clear examples of benefits of certain opcodes I could get on to Malc to add them (I'm usually quite good at getting him to add little things if I give him a clear example and make it simple for him).

Maybe topic for another thread?


NO.1 choice: subtraction for integers. Either sub reg,reg/var32; or/and psubd xmm0,xmm1/var; and to fix the nasty andnps coloring bug ...and logical not would be appreciated even in Code component.
User avatar
MyCo
Posts: 718
Joined: Tue Jul 13, 2010 12:33 pm
Location: Germany
Contact:

Re: Fast Stream Array Access

Post by MyCo »

wow, Martin has a run :P

Haven't noticed that FS supports "movd r/m32, xmm" instruction, good to know... Unfortunately it doesn't support "movd xmm, r/m32", that would give another performance boost.

BTW: Don't trust the cycle counter method, it's pretty inaccurate. On my system for example the cycle counter outputs the same for the "Simple Delay" and the "Simple Delay (Stock)", although I know there should be a huge difference. When I need a meaningful comparison, I do hundreds of synchronized copys of a module and put them in parallel into a selector (as mono/packed mono stream). And then switch between optimized/normal while looking at the CPU usage either in FS or in the resource monitor of windows.
User avatar
MyCo
Posts: 718
Joined: Tue Jul 13, 2010 12:33 pm
Location: Germany
Contact:

Re: Fast Stream Array Access

Post by MyCo »

Here is a test bench schematic that I use for optimizations. I've set it up with the delays.
Attachments
Delay Testbench (MyCo).fsm
(140.93 KiB) Downloaded 1118 times
Tronic
Posts: 539
Joined: Wed Dec 21, 2011 12:59 pm

Re: Fast Stream Array Access

Post by Tronic »

Exo wrote:I was going to ask you guys is there any opcodes you really want/need?


call [ reg ]
so we can call a function with address pointer from dll, directly in the Assembler, and use the dll as plugin.
Or any other way to call function from DLL in Code or Assembler.
KG_is_back
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Post by KG_is_back »

MyCo wrote:Here is a test bench schematic that I use for optimizations. I've set it up with the delays.


very interesting! the stock delays show about 20% and the "optimized" show 30-40% on my machine.
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: Fast Stream Array Access

Post by martinvicanek »

MyCo wrote:When I need a meaningful comparison, I do hundreds of synchronized copys of a module and put them in parallel into a selector (as mono/packed mono stream). And then switch between optimized/normal while looking at the CPU usage either in FS or in the resource monitor of windows.

Hm, very confusing. The mass test does not show a big difference between stock and "optimized" - if any, then the other way round. :? When you go to 10 instead of 100 copies then the proportions change towards the analyzer result. For me this shows that performance is a complex beast, it depends very much on context. Measuring the performance of one isolated unit seems to have little meaning. But then again, is the mass setup with 100 delays in parallel more representative of a real scenario?

I have implemented "fast" lookup table modules but now I hesitate to post them ...
tester
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: Fast Stream Array Access

Post by tester »

When I play with oscillators, I usually have few hunderts of them on board. So - yes, it can be a real scenario, and it has practical uses. But on the other hand - even if your oscillators have better performance within smaller designs, these designs can be heavy on other parts, so these few percent can become helpful too. I think I may have a possibility to do a quick test of multi-osc setup, to see what is the real-life difference between stock and custom made part.

In fact - this is why I asked you the question on possibility to make "multisine" oscillators. I'm not sure if there is any way to make a single "shape" oscillator, that as an input takes a list of random sine frequencies (at c.a. 0.01Hz accuracy each).
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
KG_is_back
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Post by KG_is_back »

Actually, now more relevant test occur to me - we can put the module into poly section and create module, that initiates given number of voices. Because poly section can work in parallel independently and run only when voice is on, we can avoid selectors.
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Opcode Wishlist

Post by martinvicanek »

KG_is_back wrote:
Exo wrote:I was going to ask you guys is there any opcodes you really want/need?

NO.1 choice: subtraction for integers. Either sub reg,reg/var32; or/and psubd xmm0,xmm1/var; and to fix the nasty andnps coloring bug ...and logical not would be appreciated even in Code component.

+1, and the following:

PSRLD xmm1, xmm2/m128
Shift doublewords in xmm1 right by amount specified in xmm2/m128 while shifting in 0s.
(Would be handy for some IEE 754 trickey in log and exp approximations)

PMULUDQ xmm1, xmm2/m128
Multiply packed unsigned doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2/m128, and store the quadword results in xmm1.
(Useful for linear congrugential random number generator)

PADDD xmm1, xmm2
Add packed doubleword integers from xmm2/m128 and xmm1.
(Current implementation only supports PADDD xmm1, m128)

Exo wrote:Maybe topic for another thread?
Yes, please :)
User avatar
martinvicanek
Posts: 1334
Joined: Sat Jun 22, 2013 8:28 pm

Re: Fast Stream Array Access

Post by martinvicanek »

martinvicanek wrote:Hm, very confusing. The mass test does not show a big difference between stock and "optimized" - if any, then the other way round. :?

Apparently this paradox has confused others before:
http://synthmaker.co.uk/forum/viewtopic ... =30#p77149
Post Reply