Emotional Speech Synthesis (Beta)

This subsection describes emotional speech synthesis engine.

Be aware, our emotional speech synthesis has the Beta status.

Why Do I Need It?

It's more comfortable to talk with somebody who is more empathic. This way your AI can better express itself.

There are some examples for you to better understand the importance of emotional speech synthesis. Let's say you came home after a long day and talk with your Dasha-based Home Assistant, you had written yourself:

John Doe: Hey, Dasha. It was a hard day. Would you mind dimming the light? AI with sorrowful emotion: I'm sorry to hear that. I'll dim the light. Do you need anything else?

Or, for example, you want to make an answering machine swearing to anyone calling you whenever your status is set up to "busy":

Caller: Hi, I'd like you to buy some of our stuff! AI with angry emotion: I'm busy! Call me later!

How Do I Use It?

There are 3 ways for you to try our Emotional Speech Synthesis:

  • With the help of Dasha CLI.
  • With the DSL command.
  • Define phrase emotion in the phrasemap.json.

To use any of them, you need to get our SDK installed. Read more about installation process in the Getting Started section.

The most intriguing thing is, we don't limit you with some predefined emotions. We extract information about emotion from the text you give us. So you are free to find an exact mix of emotions that works best for you!

Emotional Speech Synthesis With CLI

The simplest way to convert your text to emotional speech is through Dasha CLI. All you need to do is just to run:

dasha tts synthesize \ --provider-name dasha-emotional \ --speaker "Kate" \ --emotion "from text: I am so sorry." \ "I'm sorry to hear that. I'll dim the light. Do you need anything else?" -o sorry.mp3

That's all. You will find your synthesized speech in the sorry.mp3 file. The emotion to use will be extracted from a text after the from text: prefix.

You can find parameters' description and their possible values ​​in the Parameters subsection.

Emotional Speech Synthesis With DSL

For the sayText command to work with emotional TTS engine, you will need to set TtsProviderName to dasha-emotional.

You can choose the correct emotion just in time of a node execution.

It's achievable with the sayText command.

The most usual use-case will be:

#sayText("Is it me you looking for?", options: { emotion: "from text: I'm so glad and happy!", speed: 1.5});

Read more about sayText command.

Emotional Speech Synthesis In Phrasemap

As in the previous subsection, to work with emotional TTS engine, you will need to set TtsProviderName to dasha-emotional.

And you can simply set emotion field of the VoiceInfo ​​for all of your phrases in phrasemap.json.

For example:

{ "default": { "voiceInfo": { "lang": "en-US", "speaker": "Kate", "speed": 1.0, "emotion": "from text: You are welcome!" }, "phrases": { ... } } }

You can also override VoiceInfo for exactly one phrase:

{ "default": { "voiceInfo": { "lang": "en-US", "speaker": "Kate", "speed": 1.0, "emotion": "from text: You are welcome!" }, "phrases": { "dont_understand_forward": [ { "text": "I'm sorry, let me transfer you to another agent. Please stand by!", "voiceInfo": { "speed": 0.9, "emotion": "from text: I'm so sorry." } } ], ... } } }

Parameters

Currently, our emotional TTS engine supports the following parameters:

NameDescription
textText to convert to speech. Text's length shouldn't be more than 768 characters.
speaker?Defines the voice of the synthesized speech. Valid parameter values ​​are Kate and Linda. The default value is Kate.
speed?Defines the speed of the synthesized speech. Valid parameter values ​​are from the range [0.25, 4.0]. The default value is 1.0.
emotion?Defines a text for emotion extraction. It should start with from text: prefix. The default value is the same as the text value ​​of the request.
Found a mistake? Email us, and we'll send you a free t-shirt!

Enroll in beta

Request invite to our private Beta program for developers to join the waitlist. No spam, we promise.