Natural language generation

Natural language generation (NLG) is an essential part of the intelligent dialog systems.

Phrasemap

The phrasemap file is a json file with the following structure:

interface IPhrasemapSection { voiceInfo: { speaker: string lang: string emotion?: string speed?: number variation?: number } macros: { [phraseId: string]: { [argName: string]: string } } phrases: { [phraseId: string]: PhrasemapEntry } } interface IPhrasemap { $version: "2.0" default: IPhrasemapSection [sectionId: string]: IPhrasemapSection }

There is a dictionary of the phrasemaps sections at the root of the file. Each section defines its own way of pronunciation for the respective speaker and language.
The pronunciated message contains 2 identifiers: the phrase identifier phraseId and the section identifier sectionId (if not specified, the default section is used).

voiceInfo

An object describing the default voice parameters for the section. The fields of this object can be overridden in the phrases of the current section. It is necessary to set speaker - speaker identifier in audio recordings and lang - language.

macros

Defines the signatures of each macro, used for static analysis in DSL. The top-level key phraseId must match the name of the described macro, the lower-level key argName must match the name of the described macro argument, and the corresponding value must match the type of this argument.

phrases

Dictionary of matching phrase keys to phrases.

Each phrase can be defined by one of the following options:

  • Phrase text with voice IPhrase - contains phrase text and voice parameters for sending to TTS;

  • Phrase ID IPhraseId - contains phrase ID from phrases section;

  • The identifier of the substituted argument IArgumentId - contains the name of the argument by which the substitution is performed and the type of this argument.

The variability of phrases can be described:

  • Abbreviations-repetitions - a construction indicating repeated and abbreviated forms of a phrase;

  • A set with a random selection - a construction containing a set of phrases, any phrase from the list can be sent to TTS.

Data structure:

interface IPhrase { text: string voiceInfo?: PhraseVoiceInfo ssml?: string } interface IArgumentId { id: string type: string voiceInfo?: PhraseVoiceInfo } interface IPhraseId { id: string args?: { [key: string]: IArgumentId | IPhrase | string | null } voiceInfo?: PhraseVoiceInfo } type Phrase = IPhraseId | IArgumentId | IPhrase | string

Phrase structures:

interface IRandom<T> { random: T[] } interface IRepeatable<T> { first: T repeat?: T short?: T }

random - list of possible phrases first - a version of the phrase at the first pronunciation repeat - a version of the phrase when repeated short - short version of the phrase

Phrases can be combined into a sequence given by a list.

The argument type (for now) is only dynamic - a type accepted only by generative parsers, in which the value is passed as text without changes, or phrase - the value corresponding to this type must be a valid phrase key in this section or in imported sections.

Examples:

{ "hello": ["Hello! "], "introduction": { "first": [ { "text": "We are from the city" }, { "id": "cityName", "type": "dynamic" } ], "repeat": { "random": [ [ { "type": "phrase", "id": "hello" }, { "text": "We are", "voiceInfo": { "speed": 0.7 } }, { "id": "cityName", "type": "dynamic" } ], [{ "text": "We are calling from the city" }, { "id": "cityName", "type": "dynamic" }] ] } } }
Found a mistake? Let us know.

Enroll in beta

Request invite to our private Beta program for developers to join the waitlist. No spam, we promise.