spatial & more

by **tester** » Thu Oct 03, 2013 12:56 am

The topic of spatialization beyond headphones comes back again and again to me, so I found today this:

http://interface.cipic.ucdavis.edu/sound/hrtf.html
http://marl.smusic.nyu.edu/projects/HRIRrepository
http://www.kfs.oeaw.ac.at/index.php?opt ... 06&lang=en
http://www.audiogroup.web.fh-koeln.de/
http://en.wikipedia.org/wiki/Head-relat ... r_function

http://asadl.org/jasa/resource/1/jasman ... ypassSSO=1
(free pdf available)

Would be really cool to create some app that allows to process sounds that way. Anyone has the experience with it?

*

The problem with many binaural approaches is - they don't include real-life, ordinary situations. As an example. If you lie down on a bed, then what is behind you and what is below? Context - reflections. How often do you listen to music in horizontal position? So it would be cool to create some tool to load these strange hrtf things and combine with other things reverblike things in order to make something useful, isn't it? Right now, the only working solution for that sort of spatialization seems to be longcat h3d plugin, but they focused on something else ("market first").

by **trogluddite** » Thu Oct 03, 2013 8:55 am

No experience of HRTF specifically, but a brief look at the links suggests that the main problem would be decoding the HRTF data files...

Seems that they are collections of impulse responses - the same kind of data as used by 'convolution' type reverb plugins and speaker/cabinet simulators. So each impulse response is basically wave data, but with lots of them encoded into an array, so that the correct one can be chosen for each combination of angle and distance.

Each individual impulse response would be quite short, I would imagine - the distance around the head is equivalent to only a millisecond or two of delay - much shorter than most "room" reverb responses would be. So, a basic "brute force" convolution engine, similar to the experiments with FIR filters, could provide the DSP needed. Getting the impulse data into the convolution engine is the trickiest bit (passing arrays into code/ASM) - but oddly enough, I discovered something yesterday that might allow for this (more later - I'm still testing for stability).

Ruby can be used to read files in their raw binary representation - so, understanding the exact layout of the file format looks like the biggest hurdle.

by **tester** » Thu Oct 03, 2013 11:41 am

Some time ago I did simple band measurements (I did not knew what else could I measure) on the most impressive examples of such sort of spatialization, that worked (vertically and to some degree horizontally) even in monoaural (yep, only 1 channel) mode, which means that the assymetry of head components is in the game, and it can work without binaural delays. I post these things with my explaiations later, maybe someone can help how to decompose that stuff into basics. Maybe there is a way of doing such thing without HRTF model? I suspect that it could wotk in similar way like that wonderful formant filter (of martin) - just with few correlated filters and reverbs/delays - driven by simple arrays of coeffs.

BTW, still exploring other materials, so here it is:
http://www.bcl.hamilton.ie/~barak/paper ... CA2004.pdf
http://kerfoffle.au.com/no-hrtf/
http://www.bcs.rochester.edu/courses/cr ... ation1.pdf
http://www.umiacs.umd.edu/labs/cvl/pirl ... tches.html
http://en.wikipedia.org/wiki/Sound_localization
http://link.springer.com/article/10.3758%2FBF03212242
http://en.wikipedia.org/wiki/Critical_band
(...but I think less than 24 critical bands might be important in the game)

http://www.cim.mcgill.ca/~clark/nordmod ... ation.html
http://www.cim.mcgill.ca/~clark/nordmod ... k_toc.html
interesting book

by **tester** » Tue Oct 08, 2013 11:49 pm

I'm here and I'd like to get beyond there. In order to get the effect - stereo headphones are required (and stereo settings with no DSP), plus - it's better to download the sound because online streaming has limited bitrate (thus - sounds metallic). The spatial version was processed, not re-recorded via in-ear binaural mics. I don't know how many can hear the vertical location changes, i.e. how deep down below the shoulder level, because these sort of sounds are tricky. In general close-to-body sounds, can be perceived down to - let say around hips level. One thing that I encountered with classic binaural recording and processing is, that people either perceive vertical placement and wonder how others can't, or - they perceive that sound is moving away behind (but not down) and those can't believe that others hear vertical placement.

From the past. If you remember the famous "matchbox shaking" recording (oh, many people do similar recordings too, but... this is the only one seems to work correctly in confusing cases), I did some "split and hear" experiments on that one. From my notepad: ;-)

I created two files. Sharp band-pass filter (between 8000-9500Hz only) and band-reject filter (all, except 8000-9500Hz). It looks, that when the small 1.5kHz band is removed, then the sound is comparable to typical binaural recordings (vertical movement does not goes lower than neck/shoulder line). Resonance and/or filtering parts in our head, responsible for vertical hearing - will be then related to this general frequency range. Files combined together via multitrack mixing - give back the whole effect. I also did some manipulations. As an example. The amplitude modification can be -12dB or +12dB, and verticality was (for me) still there. Delay less than 3ms did not makes too much difference in verticality. Some cheap flanger did not made a change. Such kind of experiments are thought provoking to me.

I agree, that various measurements can be different than mine. It's about the principle. I was curious, whether the frequency band required - is "all above threshold" or "specific band" related. I also was curious, whether the perception threshold is sharp or soft (in "sotf" case would be difficult to find it).

I'm not claming, that the information about localization is only in this 1.5kHz band, because it is a matter of interrelations between the lower and higher band too, as research shows. However this window is pointing to something else, maybe located inside the head, something that adds it's own resonant filtering effect (cochlea), "invisible sound" so to speak.

Puff... Plus - in classic binaural processing there is a problem with front/rear identification within recorded files.

by **tester** » Thu Oct 10, 2013 1:55 am

...which reminds me few more thoughts and experiments from the past. When I played with my own binaural in-ear mics and recordings - the whole thing with frequency band removal was... different than above. The band responsible for vertical plane was across the whole spectra, not just 1.5kHz spike. Removing portions of the spectra - just removed portions of vertical plane, so in order to remove the vertical sensations totally - I would have to remove almost the whole sound. Which means, that the principle was different.

So now I have a small question. Is it possible to "map" somehow this matchbox recording? I mean - is it needed some sort of source file, or it is just a matter of getting filter and reverb responses at different mark points? And if so, then - how to achieve it?

by **KG_is_back** » Wed Oct 23, 2013 5:41 pm

First thing that comes to my mind is to record (or download) the HRIR and create some sort of FIR filter plugin, that can switch (and possibly interpolate) between the impulses based on input angles.

Another way might be easier to implement. Use white noise (or even better sine sweeps):

First record the test signal at eye level, then record few lower and lower and higher and higher at the same distance and horizontal angle (so only vertical angle is changing). Then compare the spectra and approximate the effect of vertical sound source movement in front of the listener with simple IIR filters. Or you can deconvolve the files to get an impulse response of that given "angle shift".
Then do the same for horizontal movement, vertical movement on the side and behind. In the end create a plugin that calculates form the given angle how much of which effect takes place.
Then a time delay between ears gets calculated form the angle and gets implemented. (this is not needed with the IRs cos' the delay is imprinted in them).

by **tester** » Wed Oct 23, 2013 7:55 pm

I wonder if it's possible to map existing audio files in relation to spatial correlates instead of making a physical design. Sure, these points would be perceived subjectively (maybe it's better?), but they would cover a desired range of reference points. I would like to map (split with markers) existing files (if possible), because they are (sound) just good. The matchbox in above examples (then one with 1.5kHz spectra I isolated as "vital") - is somewhat like a noise. Can it be used to "copy" the filter properties? How many angle points (horizontal, vertical) are enough to use them with some sort of cross-over?

by **KG_is_back** » Wed Oct 23, 2013 9:00 pm

If you have the original file that was used as test tone and the edited file, you can cut short regions from them and deconvolve them to obtain the impulse response at that place. however the impulse will be very approximate if the sound origin is moving (or angles are automated if you're reverse engineering a plugin).

by **tester** » Wed Oct 23, 2013 9:19 pm

Well - the problem is (or is not?). Original file was recorded live using some sort of dummy head with unknown interia (that's why it was not necessarily so called "binaural recording"). So there is only that 3D file with different spatial points in time. My question would be - is it possible to use that one, and measure cross-differences between samples marked manually?

by **KG_is_back** » Wed Oct 23, 2013 9:48 pm

If you have a test tone that you know is the same every time you play it back, then the comparation is simple.
With that matchbox recording, there is a problem that the shaking in not uniform - You cannot rally that the sound source makes the exactly same sound every time, also you cal clearly hear, that many frequencies are not present in the test signal (the shaking of the matchbox), so informations about low frequency spectra can be only assumed.

But again - if unified test tone was used, you can simply cross-reference different parts of the file, as I've mentioned.

spatial & more

spatial & more

Re: spatial & more

Re: spatial & more

Re: spatial & more

Re: spatial & more

Re: spatial & more

Re: spatial & more

Re: spatial & more

Re: spatial & more

Re: spatial & more

Who is online