Page 1 of 2

do the Shufps

PostPosted: Thu May 06, 2021 11:23 am
by HughBanton
Hi all,

Can anyone tell me the shufps code to change the sse channel order from 0123 to 3210? i.e. to reverse it - what would be the 'n' in :
shufps xmm0,xmm0,n; ? (Maybe it needs more than one step ..)

I've read here that there was a handy shufps helper on the forum some years back, but I haven't been able to find it. A ref to that would be most useful!

Thanks
H

Re: do the Shufps

PostPosted: Thu May 06, 2021 6:52 pm
by tulamide
HughBanton wrote:Hi all,

Can anyone tell me the shufps code to change the sse channel order from 0123 to 3210? i.e. to reverse it - what would be the 'n' in :
shufps xmm0,xmm0,n; ? (Maybe it needs more than one step ..)

I've read here that there was a handy shufps helper on the forum some years back, but I haven't been able to find it. A ref to that would be most useful!

Thanks
H


According to Intel x86 Assembly/SSE, this would be it:
Code: Select all
shufps $0x1b, %xmm0, %xmm0 # reverse order of the 4 floats

The control byte (that apart from this language NASM is always displayed as the last operand), is an 8-bit immediate and tells what goes where.
The source operand can be an XXM register or a 128-bit memory location. The destination operand is an XMM register. The select operand is an 8-bit immediate: bits 0 and 1 select the value to be moved from the destination operand the low doubleword of the result, bits 2 and 3 select the value to be moved from the destination operand the second doubleword of the result, bits 4 and 5 select the value to be moved from the source operand the third doubleword of the result, and bits 6 and 7 select the value to be moved from the source operand the high doubleword of the result.

$0x1b is hexcode, decimal 27, binary 00011011, broken into immediate 0, 1, 2, 3, I think it's MSB order

Hope it helps!

Re: do the Shufps

PostPosted: Thu May 06, 2021 8:15 pm
by HughBanton
Hah - 27 .. that's it! Thanks Tula.

I had searched hi & lo, but couldn't find the logic written down anywhere. I'll make a note of all that.

I've been occasionally looking at Rotary Speaker stuff of late (about time ..?) and realised that swapping the mono-4 channels around like this would instantly simplify the spiders web inside the auto-panner that I've introduced. I'm trying to make the delay reflections move individually in stereo as they 'rotate', seems to be a crucial Leslie element.

Anyway, more on all this when I eventually get something worth demonstrating.

Thanks again.

H

Re: do the Shufps

PostPosted: Thu May 06, 2021 9:36 pm
by tulamide
It's the first time I had to deal with it. Which shows that it's actually pretty easy. The select operand has 8 bits, and each 2 bits represent an action to be done on the equivalent element of the register. You just need to learn 4 states:

0 = copy to least significant element
1 = copy to second element
2 = copy to third element
3 = copy to most significant element

above numbers in 2-bit binary: 0 = 00, 1 = 01, 2 = 10, 3 = 11

These are the same for all 4 instructions in the IMM8. But, and this is the catch, there's a specified order, when using two registers!

However, if you only work with one register, you can directly translate it:

ABCD to DABC
IMM8 2, 1, 0, 3 = mask 10 01 00 11 = binary 10010011 = decimal 147 = hex 0x93

Above example would be called rotation. If you are only interested in specific usage of shufps on one register, specifically broadcast, swap and rotate, this page will help you a lot, as it doesn't explain much, but gives straight usage code for specific tasks.
http://www.songho.ca/misc/sse/sse.html

EDIT: I told you it is in MSB order, but my example was in LSB order! Sorry! 0x93 would do ABCD to BCDA !
EDIT2: According to the tool, Martin posted, my original explanation is absolutely correct. So ignore Edit1 please!

Re: do the Shufps

PostPosted: Fri May 07, 2021 10:00 am
by martinvicanek
Wonderful tool by STW and infuzion!

Re: do the Shufps

PostPosted: Fri May 07, 2021 12:01 pm
by tulamide
martinvicanek wrote:Wonderful tool by STW and infuzion!

Interesting. His tool lays out the mask exactly as I did in my example. 0x97 does a right shift. But Intel explains it exactly the opposite. According to their documentation, it should do a left shift.

What's going on here? :?:

Re: do the Shufps

PostPosted: Fri May 14, 2021 8:54 pm
by tulamide
Am I ignored, or does nobody know?

Re: do the Shufps

PostPosted: Sat May 15, 2021 6:37 am
by Spogg
tulamide wrote:Am I ignored, or does nobody know?


Definitely ignored. :lol:

We need a “I read your post but I know nothing" button!

Re: do the Shufps

PostPosted: Sat May 15, 2021 10:27 am
by martinvicanek
Sorry, Tula, not ignoring your post, just don't know the answer to your question.
tulamide wrote:
The source operand can be an XXM register or a 128-bit memory location. The destination operand is an XMM register. The select operand is an 8-bit immediate: bits 0 and 1 select the value to be moved from the destination operand the low doubleword of the result, bits 2 and 3 select the value to be moved from the destination operand the second doubleword of the result, bits 4 and 5 select the value to be moved from the source operand the third doubleword of the result, and bits 6 and 7 select the value to be moved from the source operand the high doubleword of the result.

If this is intel's explanation then I don't understand it. I have read it several times but even the grammar seems odd to me. All I can say is that the shufps helper tool, which I have been using excessively for years, works flawlessly.

Re: do the Shufps

PostPosted: Sat May 15, 2021 1:58 pm
by tulamide
martinvicanek wrote:Sorry, Tula, not ignoring your post, just don't know the answer to your question.
tulamide wrote:
The source operand can be an XXM register or a 128-bit memory location. The destination operand is an XMM register. The select operand is an 8-bit immediate: bits 0 and 1 select the value to be moved from the destination operand the low doubleword of the result, bits 2 and 3 select the value to be moved from the destination operand the second doubleword of the result, bits 4 and 5 select the value to be moved from the source operand the third doubleword of the result, and bits 6 and 7 select the value to be moved from the source operand the high doubleword of the result.

If this is intel's explanation then I don't understand it. I have read it several times but even the grammar seems odd to me. All I can say is that the shufps helper tool, which I have been using excessively for years, works flawlessly.

Thanks! Yes, as I said earlier, the tool and my explanation both do the correct thing. That's why I was confused, that it's explained in the opposite order.

But nobody ever complained about the description, so I assume its flaw has long been accepted and people are aware of it? Or it is a thing of little and big endian, which is dependend on the CPU. Maybe I was reading the description for big-endian, instead of little endian as used by Intel-CPUs? Well, I think we can leave it at that.