Speech analysis and re-synthesis

by **martinvicanek** » Tue Sep 10, 2019 7:23 pm

Here is some fun. In this schematic I use La Voz Cantante as an engine for speech synthesis. A recorded piece of spoken (or sung) text is reproduced with ful and independent control over pitch, pitch variation, formant shift and speed/duration. My use of it is mainly to analyze the synthesized material in super slow motion in order to test and improve my own algos, but you can also use it just for fun!

by **wlangfor@uoguelph.ca** » Tue Sep 10, 2019 7:56 pm

Sounds like a pretty genius idea.

I'll try and figure it out.

by **kortezzzz** » Wed Sep 11, 2019 5:38 am

That's really cool, Martin. Great work

Something interesting has caught my eyes and it's the timing. That's something I've been looking for. Would it be possible to sync the timing of any given sample to a given BPM (or in other words, how can we translate the timing values in your schematic to BPM values?)? It would open many possibilities to produce "time synced sample" based toys.

Thanks!

by **Spogg** » Wed Sep 11, 2019 8:33 am

Unbelievable!

Great fun to play with but the really impressive thing is the quality of the result. It sounds incredibly real to me over a huge range of parameter values.

You never cease to surprise and amaze me Martin. :ugeek:

Cheers

Spogg

by **trogluddite** » Wed Sep 11, 2019 4:49 pm

martinvicanek wrote:My use of it is mainly to analyze the synthesized material in super slow motion in order to test and improve my own algos, but you can also use it just for fun!

After playing around for a while, I wonder whether there might be uses in speech therapy and language learning for a system like this. I noted in particular that the pitch variation control does a very good job of enhancing, removing, or inverting prosodic cues - for example, the phrase in the included sample seems to change from being a statement to being a question when the pitch variation is inverted.

Such manipulation and/or analysis of prosodic cues, maybe combined with visual feedback, might be a useful tool to supplement sessions with a speech therapist for improving the perception or production of fluent prosody - often found difficult by autistic people, folks with various hearing impairments, aphasias, etc. Likewise, I imagine it could have uses as an aid for learning pronunciation of tonal languages (e.g. most Oriental languages) for learners whose first language is non-tonal.

As a little experiment, I tried it on some recordings of my speech. My prosody is often noted as being very flat by other people (including formally at my Asperger's Syndrome diagnosis), though it doesn't sound that way to me "inside my head" when I'm speaking. Of course, it's hardly a scientific, blinded experiment; but it was interesting to find that exaggerating the pitch variation does indeed seem to make my voice seem more "typical" of what I hear in other people's voices - yet the excellent quality of the processing is such that it remains recognisable as my voice rather than a different speaker.

I had a great time using it "just for fun" too, of course. But, as ever, I think you are too modest; tools like these, in the right hands, may have the potential to be much more than just "toys" or DSP coding aids!

by **Spogg** » Wed Sep 11, 2019 5:03 pm

Brilliant idea trog!

I’ve been wondering about using several of these to create harmonic singing multi-tracking stuff from a single voice.
But I haven’t tested that out yet.

I think a lot could be done with this wonderful tool.

Cheers

Spogg

by **martinvicanek** » Wed Sep 11, 2019 6:44 pm

This demo I did uses a single voice (the ugly one that you hear at the beginning) as input.

https://vicanek.de/audioprocessing/imag ... s_demo.mp3

by **BobF** » Wed Sep 11, 2019 8:09 pm

Hi Martin,

Really like this a super lot! Great for speech therapy, special effects, voice over dubbing ( multitracking), and so on.
Does anyone except me though notice some kind of clicking noise even when noise is set to zero, or it it just my computer, daw, me, or, etc..

Later then, BobF.....

by **martinvicanek** » Thu Sep 12, 2019 9:28 pm

Thanks guys. I'll show it to my wife who is a foreign languages teacher.

by **Spogg** » Fri Sep 13, 2019 8:15 am

martinvicanek wrote:This demo I did uses a single voice (the ugly one that you hear at the beginning) as input.
https://vicanek.de/audioprocessing/imag ... s_demo.mp3

Wonderful!

This is surely a commercial product...

Cheers

Spogg

Speech analysis and re-synthesis

Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Re: Speech analysis and re-synthesis

Who is online