Support

If you have a problem or need to report a bug please email : support@dsprobotics.com

There are 3 sections to this support area:

DOWNLOADS: access to product manuals, support files and drivers

HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects

USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here

NEW REGISTRATIONS - please contact us if you wish to register on the forum

whats faster for repacking mono4 stream?

DSP related issues, mathematics, processing and techniques

whats faster for repacking mono4 stream?

Postby Nubeat7 » Mon Nov 24, 2014 12:34 am

quick asm question again,

for optimizing my schematics i do a lot of repacking mono4 streams, after using just 2 channels most of the time (stereo) i often pack 2 stereo signals (from 2 mono4 nodes) into one mono4, instead of using unpacking and packing again i normally always used this:
Code: Select all
fld in1[0];   fstp out1n2[0];
fld in1[1];   fstp out1n2[1];
fld in2[0];   fstp out1n2[2];
fld in2[1];   fstp out1n2[3];

but i also could use this:
Code: Select all
movaps xmm0,in1;
movaps xmm1,in2;
shufps xmm0,xmm1,68;
movaps out,xmm0;

which i think should be faster? am i right that the shufps is faster?
User avatar
Nubeat7
 
Posts: 1347
Joined: Sat Apr 14, 2012 9:59 am
Location: Vienna

Re: whats faster for repacking mono4 stream?

Postby KG_is_back » Mon Nov 24, 2014 12:46 am

The shufps takes only one cycle on most CPUs, In the first example you read four times from memory and write 4 times to memory, While in example 2 you read twice and read once, so it's definitely faster, as far as I can tell.

Have a look at the Opcode reference I've made recently and also you can easily use Code Speed tester to inspect the actual CPU load.
KG_is_back
 
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: whats faster for repacking mono4 stream?

Postby martinvicanek » Mon Nov 24, 2014 9:57 am

Yes, shufps is much faster. Also avoid using the stock Pack and Unpack modules as they essentially use fld and fstp. The worst example of "Verschlimmbesserung" (sorry about the German term) is the stock Stereo Clipper, where the Pack/Unpack modules overhead outweighs by far any potential CPU savings.
User avatar
martinvicanek
 
Posts: 1319
Joined: Sat Jun 22, 2013 8:28 pm

Re: whats faster for repacking mono4 stream?

Postby Nubeat7 » Mon Nov 24, 2014 4:53 pm

thanks martin for the confirmation :)

but how to do it the other way around without fld / fstp

so if i have one mono4 input (2 x stereo) and i want to route them into 2 mono4 streams again

Code: Select all
fld in[0]; fstp out1[0];
fld in[1]; fstp out1[1];
fld in[2]; fstp out2[0];
fld in[3]; fstp out2[1];


couldn't figure out a way with shufps?
User avatar
Nubeat7
 
Posts: 1347
Joined: Sat Apr 14, 2012 9:59 am
Location: Vienna

Re: whats faster for repacking mono4 stream?

Postby martinvicanek » Mon Nov 24, 2014 7:52 pm

Like this?
Code: Select all
streamin pack;
streamout out0;
streamout out1;
int true=-1;   // binary 11111111111111111111111111111111
float mask0=01;

stage0;
fld true[0]; fst mask01[0]; fstp mask01[1];

stage 2;
movaps xmm0,pack;
movaps xmm1,xmm0;
shufps xmm1,xmm1,78;   // 0123 -> 2301 (23 are first)
andps xmm0,mask01;
movaps out0,xmm0;
andps xmm1,mask01;
movaps out1,xmm1;

Or, depending on what you do with the two outputs further on, you might even drop the masking: ;)
Code: Select all
streamin pack;
streamout out0;
streamout out1;

movaps xmm0,pack;
movaps out0,xmm0;
shufps xmm0,xmm0,78;   // 0123 -> 2301 (23 are first)
movaps out1,xmm0;
User avatar
martinvicanek
 
Posts: 1319
Joined: Sat Jun 22, 2013 8:28 pm


Return to DSP

Who is online

Users browsing this forum: No registered users and 13 guests