If you have a problem or need to report a bug please email : support@dsprobotics.com
There are 3 sections to this support area:
DOWNLOADS: access to product manuals, support files and drivers
HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects
USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here
NEW REGISTRATIONS - please contact us if you wish to register on the forum
Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright
optimisation - tools (2019)
13 posts
• Page 1 of 2 • 1, 2
optimisation - tools (2019)
Hello
i try to optimise some of my modules.
what is the ultimate tool to mesure differences between 2 modules ?
my tools are quite old (osm times), i suspect to be non acurate.
lets begin by get the right tools, so is there an analyser/speed tester usable in 2019, from wich we can expect reliable results ?
thanks
i try to optimise some of my modules.
what is the ultimate tool to mesure differences between 2 modules ?
my tools are quite old (osm times), i suspect to be non acurate.
lets begin by get the right tools, so is there an analyser/speed tester usable in 2019, from wich we can expect reliable results ?
thanks
- payaDSP
- Posts: 27
- Joined: Fri Aug 22, 2014 10:11 am
Re: optimisation - tools (2019)
The old ones still work but they rely on stream. Stream takes into account poly's, multiple signals with which it allows to utilize each in an ordered manner which is why people never really made a new one.
Stream is a bit limiting though, admittedly.
Stream is a bit limiting though, admittedly.
-
wlangfor@uoguelph.ca - Posts: 912
- Joined: Tue Apr 03, 2018 5:50 pm
- Location: North Bay, Ontario, Canada
Re: optimisation - tools (2019)
Results are curious.
I put here the 3 analysors i use.
is there best one now ?
which one is the most efficient ?
I put here the 3 analysors i use.
is there best one now ?
which one is the most efficient ?
- Attachments
-
- old CPU Cycle Analyser.fsm
- analysers i use
- (36.99 KiB) Downloaded 818 times
- payaDSP
- Posts: 27
- Joined: Fri Aug 22, 2014 10:11 am
Re: optimisation - tools (2019)
Any of those three analyser would still be equally useful today. They all use the same "analyser" component internally, so there is little to choose between them other than which one you find easiest to work with (I prefer the one with the big GUI, but mostly because it's the one that I'm most familiar with.)
When I'm close to an optimal design, I always double-check by running a plugin inside a VST host - one with "per-track" CPU readout is usually most useful (e.g. Reaper.) The reason for this is that modern CPUs use many internal optimisations for efficiency; they can execute the instructions in a different order than you coded them in, and the way that memory is optimised by using CPU caches can have significant effects, too. Testing inside a host while the CPU is busy with many other tasks can show CPU/memory load effects that aren't always clear when using an isolated test set up.
When I'm close to an optimal design, I always double-check by running a plugin inside a VST host - one with "per-track" CPU readout is usually most useful (e.g. Reaper.) The reason for this is that modern CPUs use many internal optimisations for efficiency; they can execute the instructions in a different order than you coded them in, and the way that memory is optimised by using CPU caches can have significant effects, too. Testing inside a host while the CPU is busy with many other tasks can show CPU/memory load effects that aren't always clear when using an isolated test set up.
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: optimisation - tools (2019)
Thanks,
Here a simple test:
start with simplex module : INVERSOR that is STREAM * -1 (INV modules)
Translate in DSP out=in*-1 (INV DSP modules)
Translate in ASM (direct) (INV ASM modules) :
then testing...#ZERO differences
chain 12 modules to see a difference...
FSM INVERT is fastest
why is there no gain when simplify module ?
Here a simple test:
start with simplex module : INVERSOR that is STREAM * -1 (INV modules)
Translate in DSP out=in*-1 (INV DSP modules)
Translate in ASM (direct) (INV ASM modules) :
- Code: Select all
streamin in;streamout out;float FM1=-1;
movaps xmm0,in;
mulps xmm0,FM1;
//Assignment> sLeft=xmm0
movaps out,xmm0;
then testing...#ZERO differences
chain 12 modules to see a difference...
FSM INVERT is fastest
why is there no gain when simplify module ?
- Attachments
-
- CPU Cycle Analyser test1 simplex.fsm
- (32.88 KiB) Downloaded 810 times
- payaDSP
- Posts: 27
- Joined: Fri Aug 22, 2014 10:11 am
Re: optimisation - tools (2019)
In the case of DSP vs. ASM in your example, there's no difference because they are running exactly the same code - the DSP code is translated to ASM internally, and in this case it gives exactly the same instructions.
The primitive is lighter because something similar happens for stream primitives - not just the primitives, but also the connections between them, are converted to ASM internally, and this sometimes allows FS to use some optimisations which aren't possible within the ASM/DSP blocks.
The general principle is this. Instructions using only the "xmm" registers are very fast, but anything which uses float variables, streamins, and streamouts requires storing things in memory, which is much slower. The DSP->ASM translator doesn't always optimise very well - it sometimes (but not in this case) uses memory reading/writing where it doesn't really need to, and we can often optimise these better by hand, by keeping values inside "xmm" registers instead of reading/writing memory.
For example, when you look at the ASM output of a DSP code block, you sometimes see something like this...
In this case, the middle two "movaps" lines are not necessary - it's storing something only to load it straight back into the same place, and the final "movaps" at the end is all we need to make sure that the final result gets stored.
The primitive is lighter because something similar happens for stream primitives - not just the primitives, but also the connections between them, are converted to ASM internally, and this sometimes allows FS to use some optimisations which aren't possible within the ASM/DSP blocks.
The general principle is this. Instructions using only the "xmm" registers are very fast, but anything which uses float variables, streamins, and streamouts requires storing things in memory, which is much slower. The DSP->ASM translator doesn't always optimise very well - it sometimes (but not in this case) uses memory reading/writing where it doesn't really need to, and we can often optimise these better by hand, by keeping values inside "xmm" registers instead of reading/writing memory.
For example, when you look at the ASM output of a DSP code block, you sometimes see something like this...
- Code: Select all
movaps xmm0, variable
// Process xmm0
movaps variable, xmm0
movaps xmm0, variable
// Process xmm0
movaps variable, xmm0
In this case, the middle two "movaps" lines are not necessary - it's storing something only to load it straight back into the same place, and the final "movaps" at the end is all we need to make sure that the final result gets stored.
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: optimisation - tools (2019)
my knowledge of ASM is not so accurate (last PGM i wrote was on early 8bit proc about 45 years ago !), but i understand what you say.
I just remark that FS is well coded for such results. At least i should exist a difference between DSp and ASM versions since there is no "translation" in second one.
When you run a FSM scheme in FS, the scheme is interpreted or compiled ?
If interpreted, i say BRAVO again for quality coding.
so anothers questions :
( i will test further on more complex modules (4 op are too little i guess))
is the GUI (cosmetic guis like the ones of my exemple) of the module a slow factor ?
is the number of nested modules a slow factor ?
thanks for your very interresting answers
I just remark that FS is well coded for such results. At least i should exist a difference between DSp and ASM versions since there is no "translation" in second one.
When you run a FSM scheme in FS, the scheme is interpreted or compiled ?
If interpreted, i say BRAVO again for quality coding.
so anothers questions :
( i will test further on more complex modules (4 op are too little i guess))
is the GUI (cosmetic guis like the ones of my exemple) of the module a slow factor ?
is the number of nested modules a slow factor ?
thanks for your very interresting answers
- payaDSP
- Posts: 27
- Joined: Fri Aug 22, 2014 10:11 am
Re: optimisation - tools (2019)
"payaDSP wrote:At least i should exist a difference between DSp and ASM versions since there is no "translation" in second one.
The "translation" to ASM only happens once - when you write the code. When the code is running, the DSP module is really the same as an ASM module which has the "translated" code typed into it. This is different to, say, the Ruby code, which really is translated ("interpreted") every single time.
payaDSP wrote:is the GUI (cosmetic guis like the ones of my exemple) of the module a slow factor ?
It can be, yes - for example, if you use very large, fast animations. FS does not use the power of the graphics-card to draw the graphics, it is mostly done by the main CPU, so does add to the CPU load. However, this cannot be seen on the FlowStone CPU meter, which only shows audio stream processing. But it will show up on the Windows 'task manager' CPU meters (you may see a rise if you move controls very quickly, for example).
payaDSP wrote:is the number of nested modules a slow factor ?
No. The modules are only a graphical help for the user to organise things - they don't affect the CPU load at all. They may make the FS or VST file a little bit bigger, but only by a very small amount.
payaDSP wrote:thanks for your very interresting answers
You're welcome. I started with 8-bit machines (Z80 mostly) myself.
All schematics/modules I post are free for all to use - but a credit is always polite!
Don't stagnate, mutate to create!
Don't stagnate, mutate to create!
-
trogluddite - Posts: 1730
- Joined: Fri Oct 22, 2010 12:46 am
- Location: Yorkshire, UK
Re: optimisation - tools (2019)
AH ZX spectrum, and CPC 464...
and a little earlier apple II, i was very proud of my first PGM ( blinking a LED)
and a little earlier apple II, i was very proud of my first PGM ( blinking a LED)
- payaDSP
- Posts: 27
- Joined: Fri Aug 22, 2014 10:11 am
Re: optimisation - tools (2019)
trogluddite wrote:payaDSP wrote:is the number of nested modules a slow factor ?
No. The modules are only a graphical help for the user to organise things - they don't affect the CPU load at all. They may make the FS or VST file a little bit bigger, but only by a very small amount.
This is actually at least debatable. A few years ago, Exo proved through some tests, that the amount of modules does have an impact on CPU load. One example was a schematic with deeply nested modules, everything just wonderfully clear and obvious to work with. Then the same schematic in just one module. A mess and hard to work with. However, it was lighter on the CPU.
"There lies the dog buried" (German saying translated literally)
- tulamide
- Posts: 2714
- Joined: Sat Jun 21, 2014 2:48 pm
- Location: Germany
13 posts
• Page 1 of 2 • 1, 2
Who is online
Users browsing this forum: Google [Bot] and 78 guests