Support

If you have a problem or need to report a bug please email : support@dsprobotics.com

There are 3 sections to this support area:

DOWNLOADS: access to product manuals, support files and drivers

HELP & INFORMATION: tutorials and example files for learning or finding pre-made modules for your projects

USER FORUMS: meet with other users and exchange ideas, you can also get help and assistance here

NEW REGISTRATIONS - please contact us if you wish to register on the forum

Users are reminded of the forum rules they sign up to which prohibits any activity that violates any laws including posting material covered by copyright

Fast Stream Array Access

Post any examples or modules that you want to share here

Re: Opcode Wishlist

Postby Exo » Tue Oct 21, 2014 9:05 pm

martinvicanek wrote:
Exo wrote:Maybe topic for another thread?
Yes, please :)


Done :) Assembler Improvements.
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
 
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK

Re: Fast Stream Array Access

Postby Exo » Tue Nov 18, 2014 11:08 pm

Exo wrote:Hi Martin, do you think it is possible to do this trick with this code?

Code: Select all
polyintin addr;
polyintin max;
streamin index;
streamout out;

int zero = 0;
int temp = 0;
stage2;
mov eax,addr[0];
cmp eax,0;
jz bypass;

  cvtps2dq xmm0,index;
  maxps xmm0,zero;
  minps xmm0,max;
  pslld xmm0,2;
  paddd xmm0,addr;
  movaps temp,xmm0;
 
  //Read
  mov eax,temp[0];
  fld [eax] ; fstp out[0];

  mov eax,temp[1];
  fld [eax] ; fstp out[1];
 
  mov eax,temp[2];
  fld [eax] ; fstp out[2];

  mov eax,temp[3];
  fld [eax] ; fstp out[3];
   
bypass:


This reads directly from the address of a mem, instead of from the mem input or an array. Where eax is the actually memory address and we read the actual value by doing [eax] . I know it can work easy with the mem input because it is copied into a standard code array.


Seems like we got over excited with applying the shufps trick for this all we actually needed is this...

Code: Select all
polyintin addr;
polyintin max;
streamin index;
streamout out;

int zero = 0;
int temp = 0;
stage2;
mov eax,addr[0];
cmp eax,0;
jz bypass;

  cvtps2dq xmm0,index;
  maxps xmm0,zero;
  minps xmm0,max;
  pslld xmm0,2;
  paddd xmm0,addr;
  movaps temp,xmm0;
 
  //Read
  mov eax,temp[0];
  movaps xmm0,[eax];
  movaps out,xmm0;
     
bypass:


Thanks to KG for writing the Opcode reference, I was just reading about "movaps xmm0,[eax];" and realized that was perfect for this case. Because it just reads in 128bits starting from the address in temp[0] essentially reading in 4 floats at once :)

Of course it assumes data is aligned, if you are passing random index it won't work correctly. But for my case of saving phase per voice it is perfect.
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
 
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK

Re: Fast Stream Array Access

Postby KG_is_back » Tue Nov 18, 2014 11:26 pm

Exo wrote:Thanks to KG for writing the Opcode reference, I was just reading about "movaps xmm0,[eax];" and realized that was perfect for this case. Because it just reads in 128bits starting from the address in temp[0] essentially reading in 4 floats at once Of course it assumes data is aligned, if you are passing random index it won't work correctly. But for my case of saving phase per voice it is perfect.


Is it working? I'm not sure if mems are actually aligned to work like that.

EDIT:
Just did a test and it crashes when mem address is not divisible by 16. Fortunately there is a Fix - simply round up the pointer of the address to nearest multiple 16. Also make the mem to have +16 bytes to not run out of space at the end of the mem.
Attachments
mem alighn.fsm
(3.16 KiB) Downloaded 1026 times
Last edited by KG_is_back on Tue Nov 18, 2014 11:39 pm, edited 1 time in total.
KG_is_back
 
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Postby Exo » Tue Nov 18, 2014 11:38 pm

KG_is_back wrote:
Exo wrote:Thanks to KG for writing the Opcode reference, I was just reading about "movaps xmm0,[eax];" and realized that was perfect for this case. Because it just reads in 128bits starting from the address in temp[0] essentially reading in 4 floats at once Of course it assumes data is aligned, if you are passing random index it won't work correctly. But for my case of saving phase per voice it is perfect.


Is it working? I'm not sure if mems are actually aligned to work like that.


Yep try this...
Free-running-poly-osc-V1.3ReadTest.fsm
(424.73 KiB) Downloaded 1095 times


It makes sense really, it is reading in 128 bits starting from the address in temp[0]. Arrays are always aligned, each element is one after the other. So 128bits on from the start address is covering the first 4 floats of the array.
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
 
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK

Re: Fast Stream Array Access

Postby KG_is_back » Tue Nov 18, 2014 11:43 pm

Exo wrote:
KG_is_back wrote:
Exo wrote:Thanks to KG for writing the Opcode reference, I was just reading about "movaps xmm0,[eax];" and realized that was perfect for this case. Because it just reads in 128bits starting from the address in temp[0] essentially reading in 4 floats at once Of course it assumes data is aligned, if you are passing random index it won't work correctly. But for my case of saving phase per voice it is perfect.


Is it working? I'm not sure if mems are actually aligned to work like that.


Yep try this...
Free-running-poly-osc-V1.3ReadTest.fsm


It makes sense really, it is reading in 128 bits starting from the address in temp[0]. Arrays are always aligned, each element is one after the other. So 128bits on from the start address is covering the first 4 floats of the array.


It is based on the luck. If you retrigger the mem create prim you ocasionally get the array to not be 16byte aligned. My above-mentioned fix can prevent that.
KG_is_back
 
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Postby Exo » Tue Nov 18, 2014 11:52 pm

Ah ok thanks for that KG :)

I think this could enable some more optimizations in other areas too :)

By the way I have been researching some more opcodes before I contact Malc, and what do you think to SSE3? There seems to be some really nice opcodes for complex math . Math isn't my strongest point but I think they might enable a nice fast FFT, So I am going to ask for them also if you think they are worthwhile?
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
 
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK

Re: Fast Stream Array Access

Postby KG_is_back » Wed Nov 19, 2014 12:05 am

Exo wrote:By the way I have been researching some more opcodes before I contact Malc, and what do you think to SSE3? There seems to be some really nice opcodes for complex math . Math isn't my strongest point but I think they might enable a nice fast FFT, So I am going to ask for them also if you think they are worthwhile?


They seem really cool, but I'm afraid of compatibility issues. Basically all of them can be implemented by putting shufps or mulps before addps/subps. They sort of save CPU by decreasing the number of instructions needed to do complex math, but they are definitely not high on my list, since they wouldn't add new features.
KG_is_back
 
Posts: 1196
Joined: Tue Oct 22, 2013 5:43 pm
Location: Slovakia

Re: Fast Stream Array Access

Postby Exo » Wed Nov 19, 2014 12:10 am

KG_is_back wrote:
Exo wrote:By the way I have been researching some more opcodes before I contact Malc, and what do you think to SSE3? There seems to be some really nice opcodes for complex math . Math isn't my strongest point but I think they might enable a nice fast FFT, So I am going to ask for them also if you think they are worthwhile?


They seem really cool, but I'm afraid of compatibility issues. Basically all of them can be implemented by putting shufps or mulps before addps/subps. They sort of save CPU by decreasing the number of instructions needed to do complex math, but they are definitely not high on my list, since they wouldn't add new features.


Ok thanks, I wasn't sure if there was a real benefit to having them apart from a small possible CPU saving.
Flowstone Guru. Blog and download site for Flowstone.
Best VST Plugins. Initial Audio.
Exo
 
Posts: 426
Joined: Wed Aug 04, 2010 8:58 pm
Location: UK

Re: Fast Stream Array Access

Postby Tronic » Wed Nov 19, 2014 7:03 am

KG_is_back wrote:Is it working? I'm not sure if mems are actually aligned to work like that.
EDIT:
Just did a test and it crashes when mem address is not divisible by 16. Fortunately there is a Fix - simply round up the pointer of the address to nearest multiple 16. Also make the mem to have +16 bytes to not run out of space at the end of the mem.


I use this in ruby to convert MemInt and I put directly into the stream input and use as is not need to recovert in code.
Code: Select all
[(@addr+16)-(@addr%16)].pack('L').unpack('F')[0]
Tronic
 
Posts: 539
Joined: Wed Dec 21, 2011 12:59 pm

Previous

Return to User Examples

Who is online

Users browsing this forum: No registered users and 78 guests