Natural language generation
Natural language generation (NLG) is an essential part of the intelligent dialog systems.
Phrasemap
The phrasemap file is a json
file with the following structure:
interface IPhrasemapSection { voiceInfo: { speaker: string lang: string emotion?: string speed?: number variation?: number } macros: { [phraseId: string]: { [argName: string]: string } } phrases: { [phraseId: string]: PhrasemapEntry } } interface IPhrasemap { $version: "2.0" default: IPhrasemapSection [sectionId: string]: IPhrasemapSection }
There is a dictionary of the phrasemaps sections at the root of the file. Each section defines its own way of pronunciation for the respective speaker and language.
The pronunciated message contains 2 identifiers: the phrase identifier phraseId
and the section identifier sectionId
(if not specified, the default
section is used).
voiceInfo
An object describing the default voice parameters for the section.
The fields of this object can be overridden in the phrases of the current section.
It is necessary to set speaker
- speaker identifier in audio recordings and lang
- language.
macros
Defines the signatures of each macro, used for static analysis in DSL. The top-level key phraseId
must match the name of the described macro, the lower-level key argName
must match the name of the described macro argument, and the corresponding value must match the type of this argument.
phrases
Dictionary of matching phrase keys to phrases.
Each phrase can be defined by one of the following options:
Phrase text with voice
IPhrase
- contains phrase text and voice parameters for sending to TTS;Phrase ID
IPhraseId
- contains phrase ID fromphrases
section;The identifier of the substituted argument
IArgumentId
- contains the name of the argument by which the substitution is performed and the type of this argument.
The variability of phrases can be described:
Abbreviations-repetitions - a construction indicating repeated and abbreviated forms of a phrase;
A set with a random selection - a construction containing a set of phrases, any phrase from the list can be sent to TTS.
Data structure:
interface IPhrase { text: string voiceInfo?: PhraseVoiceInfo ssml?: string } interface IArgumentId { id: string type: string voiceInfo?: PhraseVoiceInfo } interface IPhraseId { id: string args?: { [key: string]: IArgumentId | IPhrase | string | null } voiceInfo?: PhraseVoiceInfo } type Phrase = IPhraseId | IArgumentId | IPhrase | string
Phrase structures:
interface IRandom<T> { random: T[] } interface IRepeatable<T> { first: T repeat?: T short?: T }
random
- list of possible phrases
first
- a version of the phrase at the first pronunciation
repeat
- a version of the phrase when repeated
short
- short version of the phrase
Phrases can be combined into a sequence given by a list.
The argument type (for now) is only dynamic
- a type accepted only by generative parsers, in which the value is passed as text without changes, or phrase
- the value corresponding to this type must be a valid phrase key in this section or in imported sections.
Examples:
{ "hello": ["Hello! "], "introduction": { "first": [ { "text": "We are from the city" }, { "id": "cityName", "type": "dynamic" } ], "repeat": { "random": [ [ { "type": "phrase", "id": "hello" }, { "text": "We are", "voiceInfo": { "speed": 0.7 } }, { "id": "cityName", "type": "dynamic" } ], [{ "text": "We are calling from the city" }, { "id": "cityName", "type": "dynamic" }] ] } } }