Support

If you have a problem or need to report a bug please email : support@dsprobotics.com

There are 3 sections to this support area:

DOWNLOADS: access to product manuals, support files and drivers

HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects

USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here

NEW REGISTRATIONS - please contact us if you wish to register on the forum

Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright

optimization question - custom selectors

For general discussion related FlowStone

optimization question - custom selectors

Postby tester » Sun Feb 06, 2022 1:06 pm

I'm looking for optimized selector-like switcher for streams, since the native selector doesn't work well when there are multiple copies of it. At the moment, I'm using theme like this:

Code: Select all
streamin sw;
streamin in1;
streamin in2;
streamin in3;
streamout out1;
float a1,a2,a3,a4;

a1 = in1&(sw==0);
a2 = in2&(sw==1);
a3 = (-1*in2)&(sw==2);
a4 = in3&(sw==3);

out1 = a1+a2+a3+a4;


What would be faster way?
(mono4 compatible)

Also, from the past, I remember, there was some asm hack, that allows to "stop" some inputs from processing, like the selectors do. But I don't remember the details now.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
tester
 
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: optimization question - custom selectors

Postby martinvicanek » Sun Feb 06, 2022 7:02 pm

You can save CPU by not storing a1 through a4 because they are actually not needed further. The following ASM code uses about half the CPU:

Code: Select all
streamin sw;
streamin in1;
streamin in2;
streamin in3;

streamout out1;

float F0=0.0;
float F1=1.0;
float F2=2.0;
float F3=3.0;

movaps xmm0,F0; cmpps xmm0,sw,0; andps xmm0,in1;   // in1&(sw==0)
movaps xmm1,F1; cmpps xmm1,sw,0; andps xmm1,in2;   // sin2&(w==1)
movaps xmm2,F2; cmpps xmm2,sw,0; andps xmm2,in2;   // in2&(sw==2)
movaps xmm3,F3; cmpps xmm3,sw,0; andps xmm3,in3;   // in3&(sw==3)

addps xmm0,xmm1; subps xmm0,xmm2; addps xmm0,xmm3;
movaps out1,xmm0;


Further optimizations might be possible, depending on how often sw changes and if it is Mono4 or has the same value for all 4 channels. In the latter case you might do the (sw==0) comparisons in green. You might also consider hopping, but there is not really much more to gain anyway.
User avatar
martinvicanek
 
Posts: 1328
Joined: Sat Jun 22, 2013 8:28 pm

Re: optimization question - custom selectors

Postby HughBanton » Mon Feb 07, 2022 11:08 am

Oh, nice one. I was attempting something complicated with stream selectors a couple of weeks back, but gave it up because of some weird behaviour. I'll certainly try again now, U-turn the U-turn.

Typo with that "sin2" obviously - had me puzzled for a moment!

And presumably .. ' addps xmm0,xmm2 ' ? Or am I as confused as usual :?

H
User avatar
HughBanton
 
Posts: 265
Joined: Sat Apr 12, 2008 3:10 pm
Location: Evesham, Worcestershire

Re: optimization question - custom selectors

Postby tester » Mon Feb 07, 2022 11:42 am

Thanks Martin,

This is for switching audio signals, full mono4 usage, so hoping or removing channels rather isn't an option.

And how such asm optimized code would look like for multiplexer? (unused outs = 0)
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
tester
 
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: optimization question - custom selectors

Postby adamszabo » Mon Feb 07, 2022 12:44 pm

HughBanton wrote:And presumably .. ' addps xmm0,xmm2 ' ? Or am I as confused as usual :?


Normally yes, but Martin made the code behave like in the very first example, so it works the same way as that.
adamszabo
 
Posts: 667
Joined: Sun Jul 11, 2010 7:21 am

Re: optimization question - custom selectors

Postby martinvicanek » Mon Feb 07, 2022 10:08 pm

A simple multiplexer would go like this:

Code: Select all
// inputs
streamin switch;
streamin in;

// outputs
streamout out0;
streamout out1;
streamout out2;
streamout out3;

// constants
float F0=0;
float F1=1;
float F2=2;
float F3=3;

// code
movaps xmm6,switch;
movaps xmm7,in;

movaps xmm0,F0; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out0,xmm0;
movaps xmm1,F1; cmpps xmm1,xmm6,0; andps xmm1,xmm7; movaps out1,xmm1;
movaps xmm2,F2; cmpps xmm2,xmm6,0; andps xmm2,xmm7; movaps out2,xmm2;
movaps xmm3,F3; cmpps xmm3,xmm6,0; andps xmm3,xmm7; movaps out3,xmm3;


If the switch input does not change very often you could hop the compares, however the CPU gain is only marginal:
Code: Select all
// inputs
streamin switch;
streamin in;

// outputs
streamout out0;
streamout out1;
streamout out2;
streamout out3;

// constants
float F0=0;
float F1=1;
float F2=2;
float F3=3;

// masks
int mask0=0;
int mask1=0;
int mask2=0;
int mask3=0;

// code
mov eax,ecx; and eax,63; cmp eax,0; jnz skipCompares;
   movaps xmm0,F0; cmpps xmm0,switch,0; movaps mask0,xmm0;
   movaps xmm1,F1; cmpps xmm1,switch,0; movaps mask1,xmm1;
   movaps xmm2,F2; cmpps xmm2,switch,0; movaps mask2,xmm2;
   movaps xmm3,F3; cmpps xmm3,switch,0; movaps mask3,xmm3;
skipCompares:

movaps xmm7,in;
movaps xmm0,mask0; andps xmm0,xmm7; movaps out0,xmm0;
movaps xmm1,mask1; andps xmm1,xmm7; movaps out1,xmm1;
movaps xmm2,mask2; andps xmm2,xmm7; movaps out2,xmm2;
movaps xmm3,mask3; andps xmm3,xmm7; movaps out3,xmm3;
User avatar
martinvicanek
 
Posts: 1328
Joined: Sat Jun 22, 2013 8:28 pm

Re: optimization question - custom selectors

Postby tester » Mon Feb 07, 2022 10:21 pm

Thanks again.

I admit, my domain is rather in wiring green relationships, than messing with asm code.
Need to take a break? I have something right for you.
Feel free to donate. Thank you for your contribution.
tester
 
Posts: 1786
Joined: Wed Jan 18, 2012 10:52 pm
Location: Poland, internet

Re: optimization question - custom selectors

Postby HughBanton » Tue Feb 08, 2022 11:48 am

No need to use xmm1, 2 or 3 in the simple multiplexer I think ...

Code: Select all
// 1-pole, 4-way Multiplexer
streamin switch, in;
streamout out0, out1, out2, out3;

float F0=0, F1=1, F2=2, F3=3;

movaps xmm6,switch;
movaps xmm7,in;

movaps xmm0,F0; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out0,xmm0; //(sw==0)
movaps xmm0,F1; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out1,xmm0; //(sw==1)
movaps xmm0,F2; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out2,xmm0; //(sw==2)
movaps xmm0,F3; cmpps xmm0,xmm6,0; andps xmm0,xmm7; movaps out3,xmm0; //(sw==3)


.. may or may not matter in practice, but means you could easily turn this into a super-efficient multipole mpx using the spare xmm's. (Should you ever need such a device!)

Also note that any of these can generally be used in stage0 only, if you only need a one-off note-on lookup of something.

H
User avatar
HughBanton
 
Posts: 265
Joined: Sat Apr 12, 2008 3:10 pm
Location: Evesham, Worcestershire

Re: optimization question - custom selectors

Postby HughBanton » Tue Feb 08, 2022 1:54 pm

Since I just lerrrv messing with asm code, I just came up with this simplification for the Selector
- OR instead of ADD

Code: Select all
   //4-in, 1-out selector
streamin sw, in0, in1, in2, in3;
streamout out;

float F0=0, F1=1, F2=2, F3=3;

     movaps xmm7,sw; //xmm7=switch
movaps xmm0,xmm7; cmpps xmm0,F0,0; andps xmm0,in0; movaps xmm1,xmm0; //(sw==0)
movaps xmm0,xmm7; cmpps xmm0,F1,0; andps xmm0,in1; orps xmm1,xmm0; //(sw==1)
movaps xmm0,xmm7; cmpps xmm0,F2,0; andps xmm0,in2; orps xmm1,xmm0; //(sw==2)
movaps xmm0,xmm7; cmpps xmm0,F3,0; andps xmm0,in3; orps xmm1,xmm0; //(sw==3)
movaps out,xmm1;


... seems to work OK?
H
User avatar
HughBanton
 
Posts: 265
Joined: Sat Apr 12, 2008 3:10 pm
Location: Evesham, Worcestershire

Re: optimization question - custom selectors

Postby martinvicanek » Wed Feb 09, 2022 3:20 am

HughBanton wrote:No need to use xmm1, 2 or 3 in the simple multiplexer I think ...

Correct, you can spare xmm1 etc. for something else if you need to. On the other hand, I like to use 4 lanes if I can afford so. If anything, it might help the processor to do things in parallel. ;)

HughBanton wrote:[...] simplification for the Selector- OR instead of ADD

Yes! For a plain selector OR will be somewhat lighter on CPU than ADD. I used ADD and SUB only to comply with the OP's requirement for sw==2.
User avatar
martinvicanek
 
Posts: 1328
Joined: Sat Jun 22, 2013 8:28 pm

Next

Return to General

Who is online

Users browsing this forum: No registered users and 89 guests