If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
Fast Stream Array Access
29 posts
• Page 2 of 3 • 1, 2, 3
Re: Fast Stream Array Access
Exo wrote:KG_is_back wrote:I was going to ask you guys is there any opcodes you really want/need? If you can give clear examples of benefits of certain opcodes I could get on to Malc to add them (I'm usually quite good at getting him to add little things if I give him a clear example and make it simple for him).
Maybe topic for another thread?
NO.1 choice: subtraction for integers. Either sub reg,reg/var32; or/and psubd xmm0,xmm1/var; and to fix the nasty andnps coloring bug ...and logical not would be appreciated even in Code component.
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: Fast Stream Array Access
wow, Martin has a run
Haven't noticed that FS supports "movd r/m32, xmm" instruction, good to know... Unfortunately it doesn't support "movd xmm, r/m32", that would give another performance boost.
BTW: Don't trust the cycle counter method, it's pretty inaccurate. On my system for example the cycle counter outputs the same for the "Simple Delay" and the "Simple Delay (Stock)", although I know there should be a huge difference. When I need a meaningful comparison, I do hundreds of synchronized copys of a module and put them in parallel into a selector (as mono/packed mono stream). And then switch between optimized/normal while looking at the CPU usage either in FS or in the resource monitor of windows.
Haven't noticed that FS supports "movd r/m32, xmm" instruction, good to know... Unfortunately it doesn't support "movd xmm, r/m32", that would give another performance boost.
BTW: Don't trust the cycle counter method, it's pretty inaccurate. On my system for example the cycle counter outputs the same for the "Simple Delay" and the "Simple Delay (Stock)", although I know there should be a huge difference. When I need a meaningful comparison, I do hundreds of synchronized copys of a module and put them in parallel into a selector (as mono/packed mono stream). And then switch between optimized/normal while looking at the CPU usage either in FS or in the resource monitor of windows.
-
MyCo - Posts: 718
- Joined: Tue Jul 13, 2010 12:33 pm
- Location: Germany
Re: Fast Stream Array Access
Here is a test bench schematic that I use for optimizations. I've set it up with the delays.
- Attachments
-
- Delay Testbench (MyCo).fsm
- (140.93 KiB) Downloaded 1000 times
-
MyCo - Posts: 718
- Joined: Tue Jul 13, 2010 12:33 pm
- Location: Germany
Re: Fast Stream Array Access
Exo wrote:I was going to ask you guys is there any opcodes you really want/need?
call [ reg ]
so we can call a function with address pointer from dll, directly in the Assembler, and use the dll as plugin.
Or any other way to call function from DLL in Code or Assembler.
- Tronic
- Posts: 539
- Joined: Wed Dec 21, 2011 12:59 pm
Re: Fast Stream Array Access
MyCo wrote:Here is a test bench schematic that I use for optimizations. I've set it up with the delays.
very interesting! the stock delays show about 20% and the "optimized" show 30-40% on my machine.
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Re: Fast Stream Array Access
MyCo wrote:When I need a meaningful comparison, I do hundreds of synchronized copys of a module and put them in parallel into a selector (as mono/packed mono stream). And then switch between optimized/normal while looking at the CPU usage either in FS or in the resource monitor of windows.
Hm, very confusing. The mass test does not show a big difference between stock and "optimized" - if any, then the other way round. When you go to 10 instead of 100 copies then the proportions change towards the analyzer result. For me this shows that performance is a complex beast, it depends very much on context. Measuring the performance of one isolated unit seems to have little meaning. But then again, is the mass setup with 100 delays in parallel more representative of a real scenario?
I have implemented "fast" lookup table modules but now I hesitate to post them ...
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
Re: Fast Stream Array Access
When I play with oscillators, I usually have few hunderts of them on board. So - yes, it can be a real scenario, and it has practical uses. But on the other hand - even if your oscillators have better performance within smaller designs, these designs can be heavy on other parts, so these few percent can become helpful too. I think I may have a possibility to do a quick test of multi-osc setup, to see what is the real-life difference between stock and custom made part.
In fact - this is why I asked you the question on possibility to make "multisine" oscillators. I'm not sure if there is any way to make a single "shape" oscillator, that as an input takes a list of random sine frequencies (at c.a. 0.01Hz accuracy each).
In fact - this is why I asked you the question on possibility to make "multisine" oscillators. I'm not sure if there is any way to make a single "shape" oscillator, that as an input takes a list of random sine frequencies (at c.a. 0.01Hz accuracy each).
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
Feel free to donate. Thank you for your contribution.
- tester
- Posts: 1786
- Joined: Wed Jan 18, 2012 10:52 pm
- Location: Poland, internet
Re: Fast Stream Array Access
Actually, now more relevant test occur to me - we can put the module into poly section and create module, that initiates given number of voices. Because poly section can work in parallel independently and run only when voice is on, we can avoid selectors.
- KG_is_back
- Posts: 1196
- Joined: Tue Oct 22, 2013 5:43 pm
- Location: Slovakia
Opcode Wishlist
KG_is_back wrote:Exo wrote:I was going to ask you guys is there any opcodes you really want/need?
NO.1 choice: subtraction for integers. Either sub reg,reg/var32; or/and psubd xmm0,xmm1/var; and to fix the nasty andnps coloring bug ...and logical not would be appreciated even in Code component.
+1, and the following:
PSRLD xmm1, xmm2/m128
Shift doublewords in xmm1 right by amount specified in xmm2/m128 while shifting in 0s.
(Would be handy for some IEE 754 trickey in log and exp approximations)
PMULUDQ xmm1, xmm2/m128
Multiply packed unsigned doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2/m128, and store the quadword results in xmm1.
(Useful for linear congrugential random number generator)
PADDD xmm1, xmm2
Add packed doubleword integers from xmm2/m128 and xmm1.
(Current implementation only supports PADDD xmm1, m128)
Yes, pleaseExo wrote:Maybe topic for another thread?
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
Re: Fast Stream Array Access
martinvicanek wrote:Hm, very confusing. The mass test does not show a big difference between stock and "optimized" - if any, then the other way round.
Apparently this paradox has confused others before:
http://synthmaker.co.uk/forum/viewtopic ... =30#p77149
-
martinvicanek - Posts: 1328
- Joined: Sat Jun 22, 2013 8:28 pm
29 posts
• Page 2 of 3 • 1, 2, 3
Who is online
Users browsing this forum: No registered users and 105 guests