Dialog as a Service gRPC API
DLGaaS allows conversational AI applications to interact with Mix dialogs
Dialog as a Service is Nuance's omni-channel conversation engine. The Dialog as a Service API allows conversational AI applications to interact with dialogs created with the Mix.dialog and Mix.nlu web tools.
The gRPC protocol provided by Dialog as a Service allows a client application to interact with a dialog in all the programming languages supported by gRPC.
gRPC is an open source RPC (remote procedure call) software used to create services. It uses HTTP/2 for transport and protocol buffers to define the structure of the application. Dialog as a Service supports the gRPC proto3 version.
Version: v1
This release supports version v1 of the Dialog as a Service protocol. See gRPC setup to download the proto files and get started.
Dialog essentials
From an end-user's perspective, a dialog-enabled app is one that understands natural language, can respond in kind, and, where appropriate, can extend the conversation by following up the user's turn with appropriate questions and suggestions.
Dialogs are created using Mix.dialog; see Creating Mix.dialog Applications for more information. This document describes how to access a dialog at runtime from a client application using the DLGaaS gRPC API.
This section introduces concepts that you will need to understand to write your client application.
Session
A session represents a conversation between a user and the dialog service. For example, consider the following scenario for a coffee app:
- Service: Hello and welcome to the coffee app! What can I do for you today?
- User: I want a cappuccino.
- Service: OK, in what size would you like that?
- User: Large.
- Service: Perfect, a large cappuccino coming up!
The interactions between the client application and the dialog service for this scenario occur in the same session. A session is identified by a session ID. Each request and response exchanged between the client app and the dialog service for that specific conversation must include that session ID.
For more information on session IDs, see Step 3. Start conversation.
Playing messages and providing user input
The client application is responsible for playing messages to the user (for example, "What can I do for you today?") and for collecting and returning the user input to the dialog service (for example, "I want a cappuccino").
Messages can be provided to the user in the form of:
- Text to be rendered using text-to-speech (TTS); this text can be generated directly through the DLGaaS API
- Text to be visually displayed, for example, in a chat
- Audio file
The client app can then send the user input to the dialog service in a few ways:
- As audio to be recognized and interpreted by Nuance; see Stream audio to the ASR service for more information.
- As text to be interpreted by Nuance. In this case, the client application returns the input string to the dialog application.
- As interpretation results. This assumes that interpretation of the user input is performed by an external system. In this case, the client application is responsible for returning the results of the interpretation to the dialog application.
Stream audio to the Dialog service
You can now use the DLGaaS API to stream audio and perform recognition on a user input. This allows you to interact with the Nuance ASR (Automatic Speech Recognition) service without having to use the ASRaaS API.
When audio is sent, DLGaaS streams it to ASRaaS, which performs recognition. The recognized content is then sent to NLUaaS for interpretation, which is then used by the dialog application.
Nodes and actions
Mix.dialog nodes that trigger a call to the DLGaaS API
You create applications in Mix.dialog using nodes. Each node performs a specific task, such as asking a question, playing a message, and performing recognition. As you add nodes and connect them to one another, the dialog flow takes shape in the form of a graph.
At specific points in the dialog, when the dialog service requires input from the client application, it sends an action to the client app. In the context of DLGaaS, the following Mix.dialog nodes trigger a call to the DLGaaS API and send a corresponding action:
Question and answer
The objective of the question and answer node is to collect user input. It sends a message to the client application and expects user input, which can be audio, a text utterance, or an interpretation. For example, in the coffee app, the dialog may tell the client app to ask the user "What type of coffee would you like today" and then to return the user's answer.
The message specified in a question and answer node is sent to the client application as a question and answer action. To continue the flow, the client application must then return the user input to the question and answer node.
See Question and answer actions for details.
Data access
The data access node tells the client app that the dialog expects data to continue the flow. It can also be used to exchange information between the client app and the dialog. For example, in a coffee app, the dialog may ask the client application to query the price of the order or to retrieve the name of the user.
Data is sent to the client application in a data access action. To continue the flow, the client application must return the requested data.
See Data access actions for details.
External actions: Transfer and End
There are two types of external actions nodes:
- Transfer: This node triggers an escalation action to be sent to the client application; it can be used, for example, to escalate to an IVR agent. It sends data to the client application. To continue the flow, the client application must return a
returnCode
, at a minimum. See Transfer actions for details. - End: This node triggers an end action to indicate the end of the dialog application. It does not expect a response from the client app. See End actions for details.
Message node
The message node plays a message. The message specified in a message node is sent to the client application as a message action.
See Message actions for details.
Session data
In some situations, you may want to send data from the client application to the dialog service to be used during the session. For example, at the beginning of a dialog you might want to send the geographical location of the user, the user name and phone number, and so on.
For more information, see Exchanging session data.
Selectors
Most dialog applications can support multiple channels and languages, so you need to select which channel and language to use for an interaction in your API. This is done through a selector.
A selector is the combination of:
- The channel through which messages are transmitted to users, such as an IVR system, a live chat, a chatbot, and so on. The channels are defined when creating a Mix project.
- The language to use for the interactions.
- The library to use for the interaction. (Advanced customization reserved for future use. Use the default value for now, which is
default
.)
You do not need to send the selector at each interaction. If the selector is not included, the values of the previous interaction will be used.
Prerequisites from Mix
Before developing your gRPC application, you need a Mix project that provides a dialog application as well as authorization credentials.
- Create a Mix project:
- Create a Mix.dialog application, as described in Creating Mix.dialog Applications.
- Build your dialog application.
- Set up your application configuration.
- Deploy your application configuration.
- Generate a "secret" and client ID of your Mix project: see Authorize your client application. Later you will use these credentials to request an access token to run your application.
- Learn the URL to call the Dialog service: see Accessing a runtime service.
- For DLGaaS, this is:
dlg.api.nuance.com:443
- For DLGaaS, this is:
gRPC setup
Get proto files
# For DLGaaS
\nuance\dlg\v1\dlg_interface.proto
\nuance\dlg\v1\dlg_messages.proto
\nuance\dlg\v1\common\dlg_common_messages.proto
#For ASRaaS audio streaming
\nuance\asr\v1\recognizer.proto
\nuance\asr\v1\resource.proto
\nuance\asr\v1\result.proto
#For TTSaaS streaming
\nuance\tts\v1\nuance_tts_v1.proto
Install gRPC for programming language, e.g. Python
$ pip install --upgrade pip
$ pip install grpcio
$ pip install grpcio-tools
For Python, use protoc to generate stubs
$ echo "Pulling support files"
$ mkdir -p google/api
$ curl https://raw.githubusercontent.com/googleapis/googleapis/master/google/api/annotations.proto > google/api/annotations.proto
$ curl https://raw.githubusercontent.com/googleapis/googleapis/master/google/api/http.proto > google/api/http.proto
$ echo "generate the stubs for support files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/http.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/annotations.proto
$ echo "generate the stubs for the DLGaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_interface.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_messages.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/common/dlg_common_messages.proto
$ echo "generate the stubs for the ASRaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/recognizer.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/resource.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/result.proto
$ echo "generate the stubs for the TTSaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/tts/v1/nuance_tts_v1.proto
$ echo "generate the stubs for the NLUaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/runtime.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/result.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/interpretation-common.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/single-intent-interpretation.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/multi-intent-interpretation.proto
The basic steps in using the Dialog as a Service gRPC protocol are:
Download the gRPC .proto files here. These files contain a generic version of the functions or classes that can interact with the dialog service.
See Note about packaged proto files below.Install gRPC for the programming language of your choice, including C++, Java, Python, Go, Ruby, C#, Node.js, and others. See gRPC Documentation for a complete list and instructions on using gRPC with each language.
Generate client stub files in your programming language from the proto files. Depending on your programming language, the stubs may consist of one file or multiple files per proto file.
These stub files contain the methods and fields from the proto files as implemented in your programming language. You will consult the stubs in conjunction with the proto files. See gRPC API.
Write your client app, referencing the functions or classes in the client stub files. See Client app development for details and a scenario.
Note about packaged proto files
The DLGaaS API provides features that require that you install the ASR, TTS, and NLU proto files:
- The StreamInput request performs recognition on streamed audio using ASRaaS and requests speech synthesis using TTSaaS.
- The ExecuteRequest allows you to specify interpretation results in the NLUaaS format.
For your convenience, these files are packaged with the DLGaaS proto files available here, and this documentation provides instructions for generating the stub files.
As such, the following files are packaged with this documentation:
- nuance/dlg/v1/dlg_interface.proto
- nuance/dlg/v1/dlg_messages.proto
- nuance/dlg/v1/common/dlg_common_messages.proto
- nuance/asr/v1/recognizer.proto
- nuance/asr/v1/resource.proto
- nuance/asr/v1/result.proto
- nuance/tts/v1/nuance_tts_v1.proto
- nuance/nlu/v1/runtime.proto
- nuance/nlu/v1/result.proto
- nuance/nlu/v1/interpretation-common.proto
- nuance/nlu/v1/single-intent-interpretation.proto
- nuance/nlu/v1/multi-intent-interpretation.proto
Client app development
This section describes the main steps in a typical client application that interacts with a Mix.dialog application. In particular, it provides an overview of the different methods and messages used in a sample order coffee application.
Sample dialog exchange
To illustrate how to use the API, this document uses the following simple dialog exchange between an end user and a dialog application:
- System: Hello! Welcome to the coffee app. What type of coffee would you like?
- User: I want an espresso.
- System: And in what size would like that?
- User: Double.
- System: Thanks, your order is coming right up!
Overview
The DialogService is the main entry point to the Nuance Dialog service.
A typical workflow for accessing a dialog application at runtime is as follows:
- The client application requests the access token from the Nuance authorization server.
- The client application opens a secure channel using the access token.
- The client application initiates a new conversation using the StartRequest method of the DialogService. The service returns a session ID, which is used at each interaction to keep the same conversation.
- As the user interacts with the dialog, the client application invokes one of the following methods, as often as necessary:
- The ExecuteRequest method for text input and data exchange.
An ExecuteResponse is returned to the client application when a question and answer node, a data access node, or an external actions node is encountered in the dialog flow. - The StreamInput method for audio input (ASR) and/or audio output (TTS).
A StreamOutput is returned to the client application.
- The ExecuteRequest method for text input and data exchange.
- The client application closes the conversation using the StopRequest method.
This workflow is shown in the following high-level sequence flow:
(Click the image for a close-up view)
For a detailed sequence flow diagram, see Detailed sequence flow.
Step 1. Generate token
Get token and run simple Mix client (run-simple-mix-client.sh)
#!/bin/bash
# Remember to change the colon (:) in your CLIENT_ID to code %3A
CLIENT_ID="appID%3ANMDPTRIAL_your_name_company_com_20201102T144327123022%3Ageo%3Aus%3AclientName%3Adefault"
SECRET="5JEAu0YSAjV97oV3BWy2PRofy6V8FGmywiUbc0UfkGE"
export MY_TOKEN="`curl -s -u "$CLIENT_ID:$SECRET" "https://auth.crt.nuance.com/oauth2/token" \
-d 'grant_type=client_credentials' -d 'scope=dlg' \
| python -c 'import sys, json; print(json.load(sys.stdin)["access_token"])'`"
python dlg_client.py --serverUrl "dlg.api.nuance.com:443" --token $MY_TOKEN --modelUrn "$1" --textInput "$2"
Nuance Mix uses the OAuth 2.0 protocol for authorization. To call the Dialog runtime service, your client application must request and then provide an access token. The token expires after a short period of time so must be regenerated frequently.
Your client application uses the client ID and secret from the Mix.dashboard (see Prerequisites from Mix) to generate an access token from the Nuance authorization server, available at the following URL:
https://auth.crt.nuance.com/oauth2/token
The token may be generated in several ways, either as part of the client application or as a script file. This Python example uses a Linux script to generate a token and store it in an environment variable. The token is then passed to the application, where it is used to create a secure connection to the dialog service.
The curl command in these scripts generates a JSON object including the access_token field that contains the token, then uses Python tools to extract the token from the JSON. The resulting environment variable contains only the token.
In this scenario, the colon (:) in the client ID must be changed to the code %3A so curl can parse the value correctly:
appID:NMDPTRIAL_alex_smith_nuance_com_20190919T190532:geo:qa:clientName:default
-->
appID%3ANMDPTRIAL_alex_smith_company_com_20190919T190532%3Ageo%3Aqa%3AclientName%3Adefault
Step 2. Authorize the service
def create_channel(args):
log.debug("Adding CallCredentials with token %s" % args.token)
call_credentials = grpc.access_token_call_credentials(args.token)
log.debug("Creating secure gRPC channel")
channel_credentials = grpc.ssl_channel_credentials()
channel_credentials = grpc.composite_channel_credentials(channel_credentials, call_credentials)
channel = grpc.secure_channel(args.serverUrl, credentials=channel_credentials)
return channel
You authorize the service by creating a secure gRPC channel, providing:
- The URL of the dialog service
- The access token
Step 3. Start conversation
def start_request(stub, model_ref_dict, session_id, selector_dict={}, timeout):
selector = Selector(channel=selector_dict.get('channel'),
library=selector_dict.get('library'),
language=selector_dict.get('language'))
start_payload = StartRequestPayload(model_ref=model_ref_dict)
start_req = StartRequest(session_id=session_id,
selector=selector,
payload=start_payload,
session_timeout_sec=timeout)
log.debug(f'Start Request: {start_req}')
start_response, call = stub.Start.with_call(start_req)
response = MessageToDict(start_response)
log.debug(f'Start Request Response: {response}')
return response, call
To start a new conversation, the client app sends a StartRequest message with the following information:
- An empty session ID, which tells the Dialog service to create a new ID for this conversation.
- The selector, which provides the channel, library, and language used for this conversation. This information was determined by the dialog designer in the Mix.dialog tool.
- The StartRequestPayload, which contains the reference to the model, provided as a ResourceReference. For a Mix application, this is the URN of the application configuration to use for this interaction. The StartRequestPayload can also be used to set session data.
- An optional
user_id
, which identifies a specific user within the application. See UserID for details. - An optional
client_data
, used to inject data in call logs. This data will be added to the call logs but will not be masked.
A new unique session ID is generated and returned as a response; for example:
'payload': {'session_id': 'b8cba63a-f681-11e9-ace9-d481d7843dbd'}
The client app must then use the same session ID in all subsequent requests that apply to this conversation.
Additional notes on session IDs
- The session ID is often used for logging purposes, allowing you to easily locate the logs for a session.
- If the client app specifies a session ID in the StartRequest message, then the same ID is returned in the response.
- If passing in your own session ID in the StartRequest message, please follow these guidelines:
- The session Id should not begin or end with white space or tab
- The session Id should not begin or end with hyphens
Step 4a. Interact with the user (text input)
def execute_request(stub, session_id, selector_dict={}, payload_dict={}):
selector = Selector(channel=selector_dict.get('channel'),
library=selector_dict.get('library'),
language=selector_dict.get('language'))
input = UserInput(user_text=payload_dict.get('user_input').get('userText'))
execute_payload = ExecuteRequestPayload(
user_input=input)
execute_request = ExecuteRequest(session_id=session_id,
selector=selector,
payload=execute_payload)
log.debug(f'Execute Request: {execute_payload}')
execute_response, call = stub.Execute.with_call(execute_request)
response = MessageToDict(execute_response)
log.debug(f'Execute Response: {response}')
return response, call
Interactions that use text input and do not require streaming are done through multiple ExecuteRequest calls, providing the following information:
- The session ID returned by the StartRequest.
- The selector, which provides the channel, library, and language used for this conversation. (This is optional; it is required only if the channel, library, or language values have changed since they were last sent.)
- The ExecuteRequestPayload, which can contain the following fields:
- user_input: Provides the input to the Dialog engine. For the initial ExecuteRequest, the payload is empty to get the initial message. For the subsequent requests, the input provided depends on how text interpretation is performed. See Interpreting text user input for more information.
- dialog_event: Can be used to pass in events that will drive the dialog flow. If no event is passed, the operation is assumed to be successful.
- requested_data: Contains data that was previously requested by the Dialog.
- An optional
user_id
, which identifies a specific user within the application. See UserID for details.
The dialog runtime app returns the Execute response payload when a question and answer node, a data access node, or an external actions node is encountered in the dialog flow. This payload provides the actions to be performed by the client application.
There are many types of actions that can be requested by the dialog application:
- Messages action—Indicates that a message should be played to the user. See Message actions.
- Data access action—Indicates that the dialog needs data to continue the flow. The dialog application can obtain the data it needs in two ways:
- By implementing this in the dialog application directly.Feature coming soon.
- By using the data access gRPC API; in this case, the client application is responsible for obtaining the data. See Data access actions
- Question and answer action—Tells the client app to play a message and to return the user input to the dialog. See Question and answer actions.
- End action—Indicates the end of the dialog. See End actions.
- Escalation action—Provides data that can be used, for example, to escalate to an IVR agent.
- Continue action—Indicates that the client application should perform a continue action. Feature coming soon.
For example, the following question and answer action indicates that the message "Hello! How can I help you today?" must be displayed to the user:
Note: Examples in this section are shown in JSON format for readability. However, in an actual client application, content is sent and received as protobuf objects.
"payload": {
"messages": [],
"qa_action": {
"message": {
"nlg": [],
"visual": [{
"text": "Hello! How can I help you today?"
}
],
"audio": []
}
}
}
A question and answer node expects input from the user to continue the flow. This can be provided as text (either to be interpreted by Nuance or as already interpreted input) in the next ExecuteRequest call. To provide the user input as audio, use the StreamInput request, as described in Step 4b.
Step 4b. Interact with the user (using audio)
def execute_stream_request(args, stub, session_id, selector_dict={}):
execute_responses = stub.ExecuteStream(build_stream_input(args, session_id, selector_dict))
log.debug(f'execute_responses: {execute_responses}')
responses = []
audio = bytearray(b'')
for execute_response in execute_responses:
if execute_response:
response = MessageToDict(execute_response.response)
if response: responses.append(response)
audio += execute_response.audio.audio
return responses, audio
def build_stream_input(args, session_id, selector_dict):
selector = Selector(channel=selector_dict.get('channel'),
library=selector_dict.get('library'),
language=selector_dict.get('language'))
try:
with open(args.audioFile, mode='rb') as file:
audio_buffer = file.read()
# Hard code packet_size_byte for simplicity sake (approximately 100ms of 16KHz mono audio)
packet_size_byte = 3217
audio_size = sys.getsizeof(audio_buffer)
audio_packets = [ audio_buffer[x:x + packet_size_byte] for x in range(0, audio_size, packet_size_byte) ]
# For simplicity sake, let's assume the audio file is PCM 16KHz
user_input = None
asr_control_v1 = {'audio_format': {'pcm': {'sample_rate_hz': 16000}}}
except:
# Text interpretation as normal
asr_control_v1 = None
audio_packets = [b'']
user_input = UserInput(user_text=args.textInput)
# Build execute request object
execute_payload = ExecuteRequestPayload(user_input=user_input)
execute_request = ExecuteRequest(session_id=session_id,
selector=selector,
payload=execute_payload)
# For simplicity sake, let's assume the audio file is PCM 16KHz
tts_control_v1 = {'audio_params': {'audio_format': {'pcm': {'sample_rate_hz': 16000}}}}
first_packet = True
for audio_packet in audio_packets:
if first_packet:
first_packet = False
# Only first packet should include the request header
stream_input = StreamInput(
request=execute_request,
asr_control_v1=asr_control_v1,
tts_control_v1=tts_control_v1,
audio=audio_packet
)
log.debug(f'Stream input initial: {stream_input}')
else:
stream_input = StreamInput(audio=audio_packet)
yield stream_input
Interactions with the user that require audio streaming are done through multiple StreamInput calls.
The StreamInput method can be used to:
- Provide the user input requested by a question and answer action as audio input; in this scenario, audio is streamed to ASRaaS, which performs recognition on the audio. The recognition results are sent to NLUaaS, which provides the interpretation. This is then returned to DLGaaS, which continues the dialog flow.
- Synthesize an output message into audio output using text-to-speech (TTS); in this scenario, if a TTS message has been defined in Mix.dialog for this interaction, TTSaaS synthesizes the message and streams the audio back to the client application in a series of StreamOutput calls.
The StreamInput method has the following fields:
- request, which provides the ExecuteRequest with the session ID, selector, and request payload.
- asr_control_v1, which provides the parameters to be forwarded to the ASR service, such as the audio format, recognition flags, and whether results are returned. Setting
asr_control_v1
enables streaming of input audio. audio
, which is the audio to stream for speech recognition.- tts_control_v1, which provides the parameters to be forwarded to the TTS service, such as the audio encoding and voice to use for speech synthesis. Setting
tts_control_v1
enables streaming of audio output.
This method returns a StreamOutput, which has the following fields:
- response, which provides the ExecuteResponse
audio
, which is the audio returned by the TTS (if TTS was requested)asr_result
, which contains the transcription resultasr_status
, which indicates the status of the transcriptionasr_start_of_speech
, which contains the start-of-speech message
This can be implemented as follows in your application:
To perform speech recognition on audio input
The workflow to perform speech recognition on audio input is as follows:
- The dialog service sends an ExecuteResponse with a question and answer action, indicating that it requires user input.
- The client application sends a first StreamInput method with the asr_control_v1 and request parameters to DLGaaS; this lets DLGaaS know to expect audio.
- The client application sends additional StreamInputs to stream the audio.
- The client application sends an empty StreamInput to indicate end of audio.
The audio is recognized, interpreted, and returned to the dialog application, which continues its flow. - The dialog service returns the corresponding ExecuteResponse in a single StreamOutput.
This can be seen in the detailed sequence flow. For example, assuming that the user says "I want an espresso", the client application will send a series of StreamInput methods with the following content:
# First StreamInput
{
"request": {
"session_id": "1c2c9822-45d5-460d-8696-d3fa9d8af8c2",
"selector": {
"channel": "default"
"language": "en-US"
"library": "default"
},
"payload": {}
},
"asr_control_v1": {
"audio_format": {
"pcm": {
"sample_rate_hz": 16000
}
}
},
"audio": "RIFF4\373\000\00..."
}
# Additional StreamInputs with audio bytes
{
"audio": "...audio_bytes..."
}
# Final empty StreamInput to indicate end of audio
{
}
Once audio has been recognized, interpreted, and handled by DLGaaS, the following StreamOutput is returned:
.
# StreamOutput
{
"response": {
"payload": {
"messages": [],
"qa_action": {
"message": {
"nlg": [{
"text": "What size coffee would you like? "
}
],
"visual": [{
"text": "What size coffee would you like?"
}
],
"audio": [] // This is a reference to an audio file.
}
}
}
}
}
To synthesize an output message into audio using TTS
- The client application sends the StreamInput method with the tts_control_v1 and request parameters to DLGaaS.
The dialog application continues the dialog according to the ExecuteRequest provided in the request parameter. - If the corresponding ExecuteResponse includes a TTS message (that is, a message is provided in the
nlg
field of themessage
action), this message is synthesized and the audio is streamed back to the application in a series of StreamOutput calls.
For example, assuming that the user typed "I want an espresso", the client application will send a single StreamInput method with the following content:
# StreamInput
{
"request": {
"session_id": "1c2c9822-45d5-460d-8696-d3fa9d8af8c2",
"selector": {
"channel": "default"
"language": "en-US"
"library": "default"
},
"payload": {
"user_input": {
"user_text": "I want an espresso"
}
},
},
"tts_control_v1": {
"audio_params": {
"audio_format": {
"pcm": {
"sample_rate_hz": 16000
}
}
}
}
}
Once user text has been interpreted and handled by DLGaaS, the following series of StreamOutput is returned:
Note: The StreamOutput includes the audio
field because a TTS message was defined (as shown in the nlg
field). If no TTS message was specified, no audio would have been returned.
# First StreamOutput
{
"response": {
"payload": {
"messages": [],
"qa_action": {
"message": {
"nlg": [{
"text": "What size coffee would you like? "
}
],
"visual": [{
"text": "What size coffee would you like?"
}
],
"audio": []
}
}
}
},
"audio": "RIFF4\373\000\00.."
}
# Additional StreamOutputs with audio bytes
{
"audio": "...audio_bytes..."
}
To perform both speech recognition and TTS in a single call
- The client application sends the StreamInput method with the asr_control_v1, tts_control_v1, and request parameters to DLGaaS; this lets DLGaaS know to expect audio.
- The client application streams the audio with the StreamInput method.
The audio is recognized, interpreted, and returned to the dialog application, which continues its flow. If the corresponding ExecuteResponse includes a TTS message, this message is synthesized and the audio is streamed back to the application in a series of StreamOutput calls.
Note about performing speech recognition and TTS in a dialog application
The speech recognition and TTS features provided as part of the DLGaaS API should be used in relation to your Mix.dialog, that is:
- To perform recognition on a spoken user input provided in answer to a question and answer node
- To synthesize a TTS message returned in the
nlg
field of themessage
action
To perform speech recognition or TTS outside of a Mix.dialog, please use the following services:
- For speech recognition, see the ASR as a Service gRPC API documentation.
- For TTS, see the TTS as a Service gRPC API documentation.
Step 5. Stop conversation
def stop_request(stub, session_id=None):
stop_req = StopRequest(session_id=session_id)
log.debug(f'Stop Request: {stop_req}')
stop_response, call = stub.Stop.with_call(stop_req)
response = MessageToDict(stop_response)
log.debug(f'Stop Response: {response}')
return response, call
To stop the conversation, the client app sends the StopRequest message; this message has the following fields:
- The session ID returned by the StartRequest.
- An optional
user_id
, which identifies a specific user within the application. See UserID for details.
The StopRequest message removes the session state, so the session ID for this conversation should not be used in the short term for any new interactions, to prevent any confusion when analyzing logs.
Detailed sequence flow
Sample Python app
dlg_client.py sample app
import argparse
import logging
import uuid
from google.protobuf.json_format import MessageToJson, MessageToDict
from grpc import StatusCode
from nuance.dlg.v1.common.dlg_common_messages_pb2 import *
from nuance.dlg.v1.dlg_messages_pb2 import *
from nuance.dlg.v1.dlg_interface_pb2 import *
from nuance.dlg.v1.dlg_interface_pb2_grpc import *
log = logging.getLogger(__name__)
def parse_args():
parser = argparse.ArgumentParser(
prog="dlg_client.py",
usage="%(prog)s [-options]",
add_help=False,
formatter_class=lambda prog: argparse.HelpFormatter(
prog, max_help_position=45, width=100)
)
options = parser.add_argument_group("options")
options.add_argument("-h", "--help", action="help",
help="Show this help message and exit")
options.add_argument("--token", nargs="?", help=argparse.SUPPRESS)
options.add_argument("-s", "--serverUrl", metavar="url", nargs="?",
help="Dialog server URL, default=localhost:8080", default='localhost:8080')
options.add_argument('--modelUrn', nargs="?",
help="Dialog App URN, e.g. urn:nuance-mix:tag:model/A2_C16/mix.dialog")
options.add_argument("--textInput", metavar="file", nargs="?",
help="Text to preform interpretation on")
return parser.parse_args()
def create_channel(args):
log.debug("Adding CallCredentials with token %s" % args.token)
call_credentials = grpc.access_token_call_credentials(args.token)
log.debug("Creating secure gRPC channel")
channel_credentials = grpc.ssl_channel_credentials()
channel_credentials = grpc.composite_channel_credentials(channel_credentials, call_credentials)
channel = grpc.secure_channel(args.serverUrl, credentials=channel_credentials)
return channel
def read_session_id_from_response(response_obj):
try:
session_id = response_obj.get('payload').get('sessionId', None)
except Exception as e:
raise Exception("Invalid JSON Object or response object")
if session_id:
return session_id
else:
raise Exception("Session ID is not present or some error occurred")
def start_request(stub, model_ref_dict, session_id, selector_dict={}):
selector = Selector(channel=selector_dict.get('channel'),
library=selector_dict.get('library'),
language=selector_dict.get('language'))
start_payload = StartRequestPayload(model_ref=model_ref_dict)
start_req = StartRequest(session_id=session_id,
selector=selector,
payload=start_payload)
log.debug(f'Start Request: {start_req}')
start_response, call = stub.Start.with_call(start_req)
response = MessageToDict(start_response)
log.debug(f'Start Request Response: {response}')
return response, call
def execute_request(stub, session_id, selector_dict={}, payload_dict={}):
selector = Selector(channel=selector_dict.get('channel'),
library=selector_dict.get('library'),
language=selector_dict.get('language'))
input = UserInput(user_text=payload_dict.get('user_input').get('userText'))
execute_payload = ExecuteRequestPayload(
user_input=input)
execute_request = ExecuteRequest(session_id=session_id,
selector=selector,
payload=execute_payload)
log.debug(f'Execute Request: {execute_payload}')
execute_response, call = stub.Execute.with_call(execute_request)
response = MessageToDict(execute_response)
log.debug(f'Execute Response: {response}')
return response, call
def stop_request(stub, session_id=None):
stop_req = StopRequest(session_id=session_id)
log.debug(f'Stop Request: {stop_req}')
stop_response, call = stub.Stop.with_call(stop_req)
response = MessageToDict(stop_response)
log.debug(f'Stop Response: {response}')
return response, call
def main():
args = parse_args()
log_level = logging.DEBUG
logging.basicConfig(
format='%(asctime)s %(levelname)-5s: %(message)s', level=log_level)
with create_channel(args) as channel:
stub = DialogServiceStub(channel)
model_ref_dict = {
"uri": args.modelUrn,
"type": 0
}
selector_dict = {
"channel": "default",
"language": "en-US",
"library": "default"
}
response, call = start_request(stub,
model_ref_dict=model_ref_dict,
session_id=None,
selector_dict=selector_dict
)
session_id = read_session_id_from_response(response)
log.debug(f'Session: {session_id}')
assert call.code() == StatusCode.OK
log.debug(f'Initial request, no input from the user to get initial prompt')
payload_dict = {
"user_input": {
"userText": None
}
}
response, call = execute_request(stub,
session_id=session_id,
selector_dict=selector_dict,
payload_dict=payload_dict
)
assert call.code() == StatusCode.OK
log.debug(f'Second request, passing in user input')
payload_dict = {
"user_input": {
"userText": args.textInput
}
}
response, call = execute_request(stub,
session_id=session_id,
selector_dict=selector_dict,
payload_dict=payload_dict
)
assert call.code() == StatusCode.OK
response, call = stop_request(stub,
session_id=session_id
)
assert call.code() == StatusCode.OK
if __name__ == '__main__':
main()
The sample Python application consists of these files:
- dlg_client.py: The main client application file.
- run-mix-client.sh: A script file that generates the access token and runs the application.
Requirements
To run this sample app, you need:
- Python 3.6 or later. Use
python3 --version
to check which version you have. - Credentials from Mix (a client ID and secret) to generate the access token. See Prerequisites from Mix.
Procedure
To run this sample application:
Step 1. Download the sample app here and unzip it in a working directory (for example, /home/userA/dialog-sample-python-app
).
Step 2. Download the gRPC .proto files here and unzip the files in the sample app working directory.
Step 3. Navigate to the sample app working directory and install the required dependencies:
$ python3 -m venv env
$ source env/bin/activate
$ pip install --upgrade pip
$ pip install grpcio
$ pip install grpcio-tools
$ pip install uuid
Step 4. Generate the stubs:
$ echo "Pulling support files"
$ mkdir -p google/api
$ curl https://raw.githubusercontent.com/googleapis/googleapis/master/google/api/annotations.proto > google/api/annotations.proto
$ curl https://raw.githubusercontent.com/googleapis/googleapis/master/google/api/http.proto > google/api/http.proto
$ echo "generate the stubs for support files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/http.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=./ google/api/annotations.proto
$ echo "generate the stubs for the DLGaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_interface.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/dlg_messages.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/dlg/v1/common/dlg_common_messages.proto
$ echo "generate the stubs for the ASRaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/recognizer.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/resource.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/asr/v1/result.proto
$ echo "generate the stubs for the TTSaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/tts/v1/nuance_tts_v1.proto
$ echo "generate the stubs for the NLUaaS gRPC files"
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/runtime.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/result.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/interpretation-common.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/single-intent-interpretation.proto
$ python -m grpc_tools.protoc --proto_path=./ --python_out=. --grpc_python_out=. nuance/nlu/v1/multi-intent-interpretation.proto
Step 5. Edit the run script, run-mix-client.sh, to add your CLIENT_ID and SECRET. These are your Mix credentials as described in Generate token.
CLIENT_ID="appID%3A...ENTER MIX CLIENT_ID..."
SECRET="...ENTER MIX SECRET..."
export MY_TOKEN="`curl -s -u "$CLIENT_ID:$SECRET" \
"https://auth.crt.nuance.com/oauth2/token" \
-d 'grant_type=client_credentials' -d 'scope=dlg' \
| python -c 'import sys, json; print(json.load(sys.stdin)["access_token"])'"
python dlg_client.py --serverUrl "dlg.api.nuance.com:443" --token $MY_TOKEN --modelUrn "$1" --textInput "$2"
Step 6. Run the application using the script file, passing it the URN and a text to interpret:
./run-mix-client.sh modelUrn textInput
Where:
- modelUrn: Is the URN of the application configuration for the Coffee App created in the Quick Start
- textInput: Is the text to interpret
For example:
$ ./run-mix-client.sh "urn:nuance-mix:tag:model/TestMixClient/mix.dialog" "I want a double espresso"
An output similar to the following is provided:
2020-12-07 17:04:05,414 DEBUG: Creating secure gRPC channel
2020-12-07 17:04:05,420 DEBUG: Start Request: selector {
channel: "default"
language: "en-US"
library: "default"
}
payload {
model_ref {
uri: "urn:nuance-mix:tag:model/TestMixClient/mix.dialog"
}
}
2020-12-07 17:04:05,945 DEBUG: Start Request Response: {'payload': {'sessionId': '92705444-cd59-4a04-b79c-e67203f04f0d'}}
2020-12-07 17:04:05,948 DEBUG: Session: 92705444-cd59-4a04-b79c-e67203f04f0d
2020-12-07 17:04:05,949 DEBUG: Initial request, no input from the user to get initial prompt
2020-12-07 17:04:05,952 DEBUG: Execute Request: user_input {
}
2020-12-07 17:04:06,193 DEBUG: Execute Response: {'payload': {'messages':
[{'visual': [{'text': 'Hello and welcome to the coffee app.'}], 'view': {}}],
'qaAction': {'message': {'visual': [{'text': 'What can I get you today?'}]},
'data': {}, 'view': {}}}}
2020-12-07 17:04:06,198 DEBUG: Second request, passing in user input
2020-12-07 17:04:06,199 DEBUG: Execute Request: user_input {
user_text: "I want a double espresso"
}
2020-12-07 17:04:06,791 DEBUG: Execute Response: {'payload': {'messages':
[{'visual': [{'text': 'Perfect, a double espresso coming right up!'}], 'view':
{}}], 'endAction': {'data': {}, 'id': 'End dialog'}}}
Reference topics
This section provides more detailed information about objects used in the gRPC API.
Note: Examples in this section are shown in JSON format for readability. However, in an actual client application, content is sent and received as protobuf objects.
Status messages and codes
gRPC error codes
In addition to the standard gRPC error codes, DLGaaS uses the following codes:
gRPC code | Message | Indicates |
---|---|---|
0 | OK | Normal operation |
5 | NOT FOUND | The resource specified could not be found; for example:
Troubleshooting: Make sure that the resource provided exists or that you have specified it correctly. See URN for details on the URN syntax. |
11 | OUT_OF_RANGE | The provided session timeout is not in the expected range. Troubleshooting: Specify a value between 0 and 14400 (default is 900) and try again. |
12 | UNIMPLEMENTED | The API version was not found or is not available on the URL specified. For example, a client using DLGaaS v1 is trying to access the dlgaas.beta.nuance.com URL. Troubleshooting: See URLs to runtime services for the supported URLs. |
13 | INTERNAL | There was an issue on the server side or interactions between sub systems have failed. Troubleshooting: Contact Nuance. |
16 | UNAUTHENTICATED | The credentials specified are incorrect or expired. Troubleshooting: Make sure that you have generated the access token and that you are providing the credentials as described in Authorize your client application. Note that the token needs to be regenerated regularly. See Access token lifetime for details. |
HTTP return codes
In addition to the standard HTTP error codes, DLGaaS uses the following codes:
HTTP code | Message | Indicates |
---|---|---|
200 | OK | Normal operation |
401 | UNAUTHORIZED | The credentials specified are incorrect or expired. Troubleshooting: Make sure that you have generated the access token and that you are providing the credentials as described in Authorize your client application. Note that the token needs to be regenerated regularly. See Access token lifetime for details. |
404 | NOT_FOUND | The resource specified could not be found; for example:
Troubleshooting: Make sure that the resource provided exists or that you have specified it correctly. See URN for details on the URN syntax. |
500 | INTERNAL_SERVER_ERROR | There was an issue on the server side. Troubleshooting: Contact Nuance. |
Examples
Incorrect URN
"grpc_message":"model [urn:nuance:mix/eng-USA/coffee_app_typo/mix.dialog] could not be found","grpc_status":5
Incorrect channel
"grpc_message":"channel is invalid, supported values are [Omni Channel VA, default] (error code: 5)","grpc_status":5}"
Session not found
"grpc_message":"Could not find session for [12345]","grpc_status":5}"
Incorrect credentials
"{"error":{"code":401,"status":"Unauthorized","reason":"Token is expired","message":"Access credentials are invalid"}\n","grpc_status":16}"
Message actions
Example message action as part of QA Action
{
"payload": {
"messages": [],
"qa_action": {
"message": {
"nlg": [{
"text": "What type of coffee would you like?"
}
],
"visual": [{
"text": "What <b>type</b> of coffee would you like? For the list of options, see the <a href=\"www.myserver.com/menu.html\">menu</a>."
}
],
"audio": [{
"text": "What type of coffee would you like? ",
"uri": "en-US/prompts/default/default/Message_ini_01.wav?version=1.0_1602096507331"
}
]
}
}
}
}
A message action indicates that a message should be played to the user. A message can be provided as:
- Text to be rendered using Text-to-speech: The
nlg
field provides text to synthesize. You can use the StreamInput method to synthesize text returned in thenlg
field. See Step 4b. Interact with the user (using audio) for details. - Text to be visually displayed to the user: The
visual
field provides text that can be displayed, for example, in a chat or in a web application. This field supports rich text format, so you can include HTML markups, URLs, etc. - Audio file to play to the user: The
audio
field provides a link to a recorded audio file that can be played to the end user. Theuri
field provides the link to the file, while thetext
field provides text that can be used as backup TTS if the audio file is missing or cannot be played.
Message actions can be configured in the following Mix.dialog nodes:
- Message node; in this case they are returned in the
messages
field of the ExecuteRequestPayload. Messages specified in a message node are returned only when a question and answer, data access, or external actions node occurs in the dialog flow. See Message nodes for details. - question and answer node; in this case they are returned in the message field of the ExecuteRequestPayload qa_action
Message nodes
A message node is used to play or display a message. The message specified in a message node is sent to the client application as a message action. A message node also performs non-recognition actions, such as playing a message, assigning a variable, or defining the next node in the dialog flow.
Messages configured in a message node are cumulative and sent only when a question and answer node, a data access node, or an external actions node occurs in the dialog flow. For example, consider the following dialog flow:
This would be handled as follows:
- The Dialog service sends an ExecuteResponse when encountering the question and answer node, with the following messages:
# First ExecuteResponse { "payload": { "messages": [{ "nlg": [], "visual": [{ "text": "Hey there!" } ], "audio": [] }, { "nlg": [], "visual": [{ "text": "Welcome to the coffee app." } ], "audio": [] } ], "qa_action": { "message": { "nlg": [], "visual": [{ "text": "What can I do for you today?" } ], "audio": [] } } } }
- The client application sends an ExecuteRequest with the user input.
- The Dialog service sends an ExecuteResponse when encountering the end node, with the following message action:
# Second ExecuteResponse { "payload": { "messages": [{ "nlg": [], "visual": [{ "text": "Goodbye." } ], "audio": [] } ], "end_action": {} } }
Using variables in messages
Messages can include variables. For example, in a coffee application, you might want to personalize the greeting message:
"Hello Miranda ! What can I do for you today?"
Variables are configured in Mix.dialog. They are resolved by the dialog engine and then returned to the client application. For example:
{
"payload": {
"messages": [],
"qa_action": {
"message": {
"nlg": [],
"visual": [
{
"text": "Hello Miranda ! What can I do for you today?"
}
],
"audio": []
}
}
}
}
Question and answer actions
A question and answer action is returned by a question and answer node. A question and answer node is the basic node type in dialog applications. It first plays a message and then recognizes user input.
The message specified in a question and answer node is sent to the client application as a message action.
The client application must then return the user input to the question and answer node. This can be provided in three ways:
- As audio to be recognized and interpreted by Nuance. This is implemented in the client app through the StreamInput method. See Step 4b. Interact with the user (using audio) for details.
- As text to be interpreted by Nuance. In this case, the client application returns the input string to the dialog application. See Interpreting text user input for details.
- As interpretation results. This assumes that interpretation of the user input is performed by an external system. In this case, the client application is responsible for returning the results of the interpretation to the dialog application. See Interpreting text user input for details.
In a question and answer node, the dialog flow is stopped until the client application has returned the user input.
Sending data
A question and answer node can specify data to send to the client application. This data is configured in Mix.dialog, in the Send Data tab of the question and answer node. For the procedure, see Send data to the client application in the Mix.dialog documentation.
For example, in the coffee application, you might want to send entities that you have collected in a previous node (COFFEE_TYPE and COFFEE_SIZE) as well as data that you have retrieved from an external system (the user's rewards card number):
This data is sent to the client application in the data
field of the qa_action; for example:
{
"payload": {
"messages": [],
"qa_action": {
"message": {
"nlg": [],
"visual": [
{
"text": "Your order was processed. Would you like anything else today?"
}
],
"audio": [],
"view": {
"id": "",
"name": ""
}
},
"data": {
"rewardsCard": "5367871902680912",
"COFFEE_TYPE": "espresso",
"COFFEE_SIZE": "lg"
}
}
}
}
Interactive elements
Question and answer actions can include interactive elements to be displayed by the client app, such as clickable buttons or links.
For example, in a web version of the coffee application, you may want to display Yes/No buttons so that users can confirm their selections:
Interactive elements are configured in Mix.dialog in question and answer nodes. For the procedure, see Define interactive elements in the Mix.dialog documentation.
For example, for the Yes/No buttons scenario above, you could configure two elements, one for each button, as follows:
This information is sent to the client app in the selectable field of the qa_action. For example:
{
"payload": {
"messages": [],
"qa_action": {
"message": {
"nlg": [],
"visual": [{
"text": "So you want a double espresso , is that it?"
}
],
"audio": []
},
"selectable": {
"selectable_items": [{
"value": {
"id": "answer",
"value": "yes"
},
"description": "Image of green checkmark",
"display_text": "Yes",
"display_image_uri": "/resources/images/green_checkmark.png"
}, {
"value": {
"id": "answer",
"value": "no"
},
"description": "Image of Red X",
"display_text": "No",
"display_image_uri": "/resources/images/red_x.png"
}
]
}
}
}
}
The application is then responsible for displaying the elements (in this case, the two buttons) and for returning the choice made by the user in the selected_item field of the Execute Request payload. For example:
"payload": {
"user_input": {
"selected_item": {
"id": "answer",
"value": "no"
}
}
}
Data access actions
A data access action tells the client app that the dialog expects data to continue the flow. For example, consider these use cases:
- In a coffee application, after asking the user for the type and size of coffee to order, the dialog must provide the price of the order before completing the transaction. In this use case, the dialog sends a data access action to the client application, providing the type and size of coffee and requesting the price.
- In a banking application, after having collected all the information necessary to make a payment (that is, the user's account, the payee, and the payment amount), the dialog is ready to complete the payment. In this use case, the dialog sends a data access action to the client application, providing all the transaction details so that the client application can process the payment and provide a return code back to the dialog.
Data access actions are configured in data access nodes. These nodes specify:
- Variables sent by the dialog service to the client application
- Variables sent by the client application to the dialog service
Using the data access API in the client app
Data access information is sent and received as follows:
- The dialog sends data in the
da_action
field of the ExecuteResponsePayload - The client app sends data in the
requested_data
field of the ExecuteRequestPayload
For example, in the coffee app use case, if a user says "I want a double espresso," the dialog will send this data access action information to the client application in the ExecuteResponsePayload:
{
"payload": {
"messages": [],
"da_action": {
"id": "get_coffee_price",
"data": {
"COFFEE_TYPE": "espresso",
"COFFEE_SIZE": "lg"
}
}
}
}
Where:
id
uniquely identifies the data access action node, so that the client application knows what process is required. For example, when the client app parses the executeResponse and sees a data access action id ofget_coffee_price
, it can call a function that performs this action.data
provides the values of the data that were configured in the Data Access node; in this case, the concepts that were collected.
The client application uses that information to perform the action required by the dialog, in this case fetching the price of the coffee based on the user's choice. It then returns the value in the coffee_price
variable as part of the ExecuteRequestPayload, as well as a returnCode:
{
"selector": {
"channel": "ivr",
"language": "en-US",
"library": "default"
},
"payload": {
"requested_data": {
"id": "get_coffee_price",
"data": {
"coffee_price": "4.25",
"returnCode": "0"
}
}
}
}
The returnCode is required, otherwise the Execute request will fail. A returnCode of "0" indicates a successful interaction.
Data access action sequence flow
This sequence diagram shows a data access action exchange. For simplicity, only the payload of the requests and responses related to the data access feature are shown.
Transfer actions
An external actions node of type "Transfer" in Mix.dialog sends an Escalation action in the DGLaaS API. This action can be used, for example, to escalate to an IVR agent. Any data set in the Transfer node is sent as part of the Escalation action data
field.
To continue the flow, the client application must return data in the requested_data
field of the ExecuteRequestPayload. At a minimum, this data must include a returnCode
. It can also include data requested by the dialog, if any. The returnCode is required, otherwise the Execute request will fail. A returnCode of "0" indicates a successful interaction.
For example, consider a scenario where the Transfer action is used to escalate to an agent to confirm a customer's data, as shown in the following Mix.dialog node:
This transfer action sends the userName
and userID
variables to the client application in an escalation_action, as follows:
{
"payload": {
"messages": [],
"escalation_action": {
"data": {
"userName": "Miranda Smith",
"userID": "MIRS82734"
},
"id": "TransferToAgent"
}
}
}
The client application transfers the call and then returns a returnCode
to the dialog to provide the status of the transaction. If the transfer was successful, a returnCode
of "0" returned. For example:
{
"selector": {
"channel": "default",
"language": "en-US",
"library": "default"
},
"payload": {
"requested_data": {
"id": "TransferToAgent",
"data": {
"returnCode": "0"
}
}
}
}
End actions
An external actions node of type "End" returns an End action, which indicates the end of the dialog. It includes the ID that identifies the node in the Mix.dialog application as well as any data that you set for this node. For example:
{
"payload": {
"messages": [{
"nlg": [],
"visual": [{
"text": "Perfect, a double espresso coming right up!"
}
],
"audio": []
}
],
"end_action": {
"data": {
"returnCode": "0"
},
"id": "CoffeeApp End node"
}
}
}
Handling unusable ASR audio
DLGaaS handles unusable ASR audio as follows:
- If ASRaaS returns a status code of 204 or 404 (that is, no audio was provided or recognition could not provide a result), the dialog engine treats this as NO_INPUT. For a description of the ASR status codes, please see Status messages and codes in the ASRaaS documentation.
- If audio was provided but was not recognized, ASRaaS sends a status code of 200 (Success), with a
rejected
hypothesis. This is treated as a NO_MATCH by the NLU and dialog engines.
By default, if ASRaaS does not return a valid hypothesis, the dialog flow is determined by the dialog application, according to the processing defined for the NO_INPUT and NO_MATCH events in Mix.dialog.
In some cases, you may want the client application to handle the dialog flow if a valid hypothesis is not returned. This is done by setting the end_stream_no_valid_hypotheses
parameter of the StreamInput asr_control_v1 message to true
. When this is enabled, the stream is closed and the last StreamOutput message contains the ASR result in the asr_result
field. The client application is then responsible for determining the next step in the dialog flow.
Interpreting text user input
Interpretation of user input provided as text can be performed either by the Nuance Mix Platform or by an external system.
Nuance Mix Platform performs interpretation
Example: Interpretation is performed by Nuance
"payload": {
"user_input": {
"user_text": "I want a large coffee"
}
}
When the Nuance Mix Platform is responsible for interpreting user input, the client application sends the text collected from the end user in the user_text
field of the Execute request input message. The user text is sent to NLUaaS, which performs interpretation and returns the results to DLGaaS.
External system performs interpretation
Example: Interpretation is performed by an external system (simple format)
"payload": {
"user_input": {
"interpretation": {
"confidence": 1.0,
"utterance": "I want a large americano",
"data": {
"INTENT": "ORDER_COFFEE",
"COFFEE_SIZE": "LG",
"COFFEE_TYPE": "americano"
},
"slot_literals": {
"COFFEE_SIZE": "large",
"COFFEE_TYPE": "americano"
}
}
}
}
Example: Interpretation is performed by an external system (NLUaaS format)
"payload": {
"user_input": {
"nluaasInterpretation": {
"literal": "i want a double espresso",
"interpretations": [{
"singleIntentInterpretation": {
"intent": "ORDER_COFFEE",
"confidence": 1,
"origin": "GRAMMAR",
"entities": {
"COFFEE_SIZE": {
"entities": [{
"textRange": {
"startIndex": 9,
"endIndex": 15
},
"confidence": 1,
"origin": "GRAMMAR",
"stringValue": "lg"
}
]
},
"COFFEE_TYPE": {
"entities": [{
"textRange": {
"startIndex": 16,
"endIndex": 24
},
"confidence": 1,
"origin": "GRAMMAR",
"stringValue": "espresso"
}
]
}
}
}
}
]
}
}
}
When an external system is responsible for interpreting user input, the client application sends the results of this interpretation in one of the following fields:
- For simple interpretations that include entities with string values only, use the interpretation field of the Execute request user_input message, including the intent and entities to use for this interaction.
- For interpretations that include complex entities, use the nluaas_interpretation field of the Execute request user_input message. This field expects the interpretation in the format used by the NLUaaS engine. See the NLUaaS InterpretResult documentation for details. Note that DLGaaS supports single intent interpretations only.
Exchanging session data
You can use the StartRequest to send data from the client application to the dialog service to be used during the session.
This data can include:
- The userData predefined variable
- Variables defined in Mix.dialog
userData predefined variable
StartRequest payload with session data variable
{
"selector": {
"channel": "default",
"language": "en-US",
"library": "default"
},
"payload": {
"data": {
"userData": {
"timezone": "America/Cancun",
"userGlobalID": "123123123",
"userChannelID": "163.128.3.254",
"userAuxiliaryID": "7319434000843499",
"systemID": "4561 9219 9923",
"location": {
"latitude": "21.161908",
"longitude": "-86.8515279"
},
"preferred_coffee": "espresso",
"user_name": "Miranda",
}
}
}
}
All dialog projects include the userData predefined variable, which can be set in the StartRequest payload to provide end user data such as the user's timezone, location, and so on.
The JSON code shows an example of how to pass userData in the StartRequest payload. This data can then be used in the dialog application.
For a description of the userData variable, see userData schema in the Mix.dialog documentation.
Variables defined in Mix.dialog
You can set variables that were defined in Mix.dialog in the StartRequest. For example, let's say that the user name and preferred coffee are stored on the user's phone, and you'd like to use them in your dialog application to customize your messages:
- System: Hey Miranda! What can I do for you today?
- User: I'd like my usual.
- System: Perfect, a double espresso coming up!
To implement this scenario:
- Create variables in Mix.dialog (for example,
user_name
andpreferred_coffee
). See Manage variables in the Mix.dialog documentation for details. - Use the variables in the dialog; for example, the following message node includes the
user_name
value in the initial prompt: - Send the values of
user_name
andpreferred_coffee
in the StartRequestPayload.
The dialog app can then include the user name in the first prompt:
{
"payload": {
"messages": [],
"qa_action": {
"message": {
"nlg": [],
"visual": [
{
"text": "Hello Miranda ! What can I do for you today?"
}
],
"audio": []
}
}
}
}
Simple variable types
Simple variables created in Mix.dialog are of a specified type. When you send a variable, whether in the StartRequest payload or in a data access action, you must make sure to send the data in the right format so that it can be used by the dialog application.
This table lists the types of simple variables and describes how to send them to the dialog application. The JSON code then shows examples of how to pass this type of data in a data access action.
For more information, see Simple variable types in the Mix.dialog documentation.
{
"selector": {
"channel": "default",
"language": "en-US",
"library": "default"
},
"payload": {
"requested_data": {
"id": "DataAccess",
"data": {
"returnCode": "0",
"sampleString": "This is a sample string",
"sampleAlphanumeric": "1-2 This is an alphanumeric string.",
"sampleDigits": "12",
"sampleBoolean": "true",
"sampleInt": 27,
"sampleDecimal": 12.34,
"sampleAmount": {
"unit": "USD",
"number": 10.5
},
"sampleDate": "202001014",
"sampleTime": "1212a",
"sampleDistance": {
"modifier": "LE",
"unit": "km",
"number": 10
},
"sampleTemperature": {
"unit": "C",
"number": 32
}
}
}
}
}
Variable type | Description |
---|---|
String | String of characters |
Alphanumeric | String of alphanumeric characters (a-z, A-Z, 0-9) |
Digits | String of digits (0-9) |
Boolean | Boolean (true, false) |
Integer | Whole number |
Decimal | Decimal-point number |
Amount | Amount, including currency. Specify the amount in an object with the following elements:
|
Date | Date (YYYYMMDD) |
Time | Time. Specify as a string using the format HHMMx , where x is one of the following:
|
Distance | Distance, including unit and modifier. Specify the distance in an object with the following elements:
unit and modifier values supported. |
Temperature | Temperature, including unit. Specify the temperature in an object with the following elements:
unit values supported. |
Disabling logging
You can set the suppress_log_user_data
in the StartRequestPayload to True to disable logging for ASR, NLU, TTS, and Dialog. This has the following impact:
- For Dialog, it masks the content of any potentially sensitive data in Nuance Application Reporting (NAR) and Nuance Insights for IVR (NII) logs.
- For ASR, it sets the
suppress_call_recording
RecognitionFlags field to True to disable call logging. See the ASRaaS RecognitionFlags documentation for details. - For NLU, it sets the
interpretation_input_logging_mode
InterpretationParameters field to SUPPRESSED so that input is replaced with "value suppressed." See the NLUaaS InterpretationParameters documentation for details. - For TTS, it sets the
suppress_input
EventParameters field to True to omit input text and URIs from log events. See the TTSaaS EventParameters documentation for details.
User ID
You can specify a user ID in the StartRequest, ExecuteRequest, and StopRequest. This user ID is converted into an unreadable format and stored in call logs and user-specific files. It can be used for:
- General Data Protection Regulation (GDPR) compliance: Logs for a specific user can be deleted, if necessary.
- Performance tuning: User-specific voice tuning files and NLU wordsets (such as contact lists) can be saved and used to improve performance.
Note: The user_id
value can accept any UTF-8 characters.
gRPC API
Dialog as a Service provides three protocol buffer (.proto) files to define the Dialog service for gRPC. These files contain the building blocks of your dialog applications:
- The dlg_interface.proto file defines the main DialogService interface.
- The dlg_messages.proto file defines the main DialogService methods.
- The dlg_common_messages.proto file defines the objects used in the methods.
Once you have transformed the proto files into functions and classes in your programming language using gRPC tools, you can call these functions from your client application to start a conversation with a user, collect the user's input, obtain the action to perform, and so on.
See Client app development for a scenario using Python that provides an overview of the different methods and messages used in a sample order coffee application. For other languages, consult the gRPC and Protocol Buffer documentation:
Field names in proto and stub files
In this section, the names of the fields are shown as they appear in the proto files. To see how they are generated in your programming language, consult your generated files. For example:
Proto file | Python | Go | Java | |
---|---|---|---|---|
session_id | → | session_id | SessionId | sessionId or getSessionId |
selector | → | selector | Selector | selector or setSelector |
For details, see the Protocol Buffers documentation for:
Python: https://developers.google.com/protocol-buffers/docs/reference/python-generated#fields.
Go: https://developers.google.com/protocol-buffers/docs/reference/go-generated#fields
Java: https://developers.google.com/protocol-buffers/docs/reference/java-generated#fields
Proto files structure
Structure of DLGaaS proto files
DialogService
Start
StartRequest
StartResponse
Execute
ExecuteRequest
ExecuteResponse
ExecuteStream
StreamInput
StreamOutput
Stop
StopRequest
StopResponseStartRequest
session_id
selector
channel
language
library
payload
model_ref
uri
type
data
suppress_log_user_data
session_timeout_sec
user_id
client_dataStartResponse
payload
session_idExecuteRequest
session_id
selector
channel
language
library
payload
user_input
user_text
interpretation
confidence
input_mode
utterance
data
key
value
slot_literals
key
value
slot_confidences
key
value
alternative_interpretations
selected_item
id
value
nluaas_interpretation
dialog_event
type
message
event_name
requested_data
id
data
user_idExecuteResponse
payload
messages
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
qa_action
message
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
data
view
id
name
selectable
selectable_items
value
id
value
description
display_text
display_image_uri
recognition_settings
dtmf_mappings
collection_settings
speech_settings
dtmf_settings
da_action
id
message
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
view
id
name
data
escalation_action
message
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
view
id
name
data
id
end_action
data
id
continue_action
message
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
view
id
name
data
idStreamInput
request Standard DLGaaS ExecuteRequest
asr_control_v1
audio_format
pcm | alaw | ulaw | opus | ogg_opus
utterance_detection_mode
SINGLE | MULTIPLE | DISABLED
recognition_flags
auto_punctuate
filter_profanity
mask_load_failures
etc.
result_type
no_input_timeout_ms
recognition_timeout_ms
utterance_end_silence_ms
speech_detection_sensitivity
max_hypotheses
end_stream_no_valid_hypotheses
audio
tts_control_v1
audio_params
audio_format
volume_percentage
speaking_rate_percentage
etc.
voice
name
model
etc.StreamOutput
response Standard DLGaaS ExecuteResponse
audio
asr_result
asr_status
asr_start_of_speechStopRequest
session_id
user_id
DialogService
Name | Request Type | Response Type | Description |
---|---|---|---|
Start | StartRequest | StartResponse | Starts a conversation. Returns a StartResponse object. |
Execute | ExecuteRequest | ExecuteResponse | Used to continuously interact with the conversation based on end user input or events. Returns an ExecuteResponse object that will contain data related to the dialog interactions and that can be used by the client to interact with the end user. |
ExecuteStream | StreamInput stream | StreamOutput stream | Performs recognition on streamed audio using ASRaaS and provides speech synthesis using TTSaaS. |
Stop | StopRequest | StopResponse | Ends a conversation and performs cleanup. Returns a StopResponse object. |
This service includes:
DialogService
Start
StartRequest
StartResponse
Execute
ExecuteRequest
ExecuteResponse
ExecuteStream
StreamInput
StreamOutput
Stop
StopRequest
StopResponse
StartRequest
Request object used by the Start method.
Field | Type | Description |
---|---|---|
session_id | string | Optional session ID. If not provided then one will be generated. |
selector | common.Selector | Selector providing the channel and language used for the conversation. |
payload | common.StartRequestPayload | Payload of the Start request. |
session_timeout_sec | uint32 | Session timeout value (in seconds), after which the session is terminated. |
user_id | string | Identifies a specific user within the application. See User ID. |
client_data | map<string,string> | Map of client-supplied key-value pairs to inject into the call log. Optional. Example: "client_data": { "param1": "value1", "param2": "value2" } |
This method includes:
StartRequest
session_id
selector
channel
language
library
payload
model_ref
uri
type
data
suppress_log_user_data
session_timeout_sec
user_id
client_data
StartResponse
Response object used by the Start method.
Field | Type | Description |
---|---|---|
payload | common.StartResponsePayload | Payload of the Start response. |
This method includes:
StartResponse
payload
session_id
ExecuteRequest
Request object used by the Execute method.
Field | Type | Description |
---|---|---|
session_id | string | ID for the session. |
selector | common.Selector | Selector providing the channel and language used for the conversation. |
payload | common.ExecuteRequestPayload | Payload of the Execute request. |
user_id | string | Identifies a specific user within the application. See User ID. |
This method includes:
ExecuteRequest
session_id
selector
channel
language
library
payload
user_input
user_text
interpretation
confidence
input_mode
utterance
data
key
value
slot_literals
key
value
slot_confidences
key
value
alternative_interpretations
selected_item
id
value
nluaas_interpretation
dialog_event
type
message
event_name
requested_data
id
data
user_id
ExecuteResponse
Response object used by the Execute method.
Field | Type | Description |
---|---|---|
payload | common.ExecuteResponsePayload | Payload of the Execute response. |
This method includes:
ExecuteResponse
payload
messages
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
qa_action
message
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
data
view
id
name
selectable
selectable_items
value
id
value
description
display_text
display_image_uri
recognition_settings
dtmf_mappings
collection_settings
speech_settings
dtmf_settings
da_action
id
message
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
view
id
name
data
escalation_action
message
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
view
id
name
data
id
end_action
data
id
continue_action
message
nlg
text
mask
barge_in_disabled
visual
text
mask
barge_in_disabled
audio
text
uri
mask
barge_in_disabled
view
id
name
view
id
name
data
id
StreamInput
Performs recognition on streamed audio using ASRaaS and requests speech synthesis using TTSaaS.
Field | Type | Description |
---|---|---|
request | ExecuteRequest | Standard DLGaaS ExecuteRequest; used to continue the dialog interactions. |
asr_control_v1 | AsrParamsV1 | Parameters to be forwarded to the ASR service. |
audio | bytes | Audio samples in the selected encoding for recognition. |
tts_control_v1 | TtsParamsv1 | Parameters to be forwarded to the TTS service. |
This method includes:
StreamInput
request Standard DLGaaS ExecuteRequest
asr_control_v1
audio_format
pcm | alaw | ulaw | opus | ogg_opus
utterance_detection_mode
SINGLE | MULTIPLE | DISABLED
recognition_flags
auto_punctuate
filter_profanity
mask_load_failures
etc.
result_type
no_input_timeout_ms
recognition_timeout_ms
utterance_end_silence_ms
speech_detection_sensitivity
max_hypotheses
end_stream_no_valid_hypotheses
audio
tts_control_v1
audio_params
audio_format
volume_percentage
speaking_rate_percentage
etc.
voice
name
model
etc.
StreamOutput
Streams the requested TTS output and returns ASR results.
Field | Type | Description |
---|---|---|
response | ExecuteResponse | Standard DLGaaS ExecuteResponse; used to continue the dialog interactions. |
audio | nuance.tts.v1.SynthesisResponse | TTS output. See the TTSaaS SynthesisResponse documentation for details. |
asr_result | nuance.asr.v1.Result | Output message containing the transcription result, including the result type, the start and end times, metadata about the transcription, and one or more transcription hypotheses. See the ASRaaS Result documentation for details. |
asr_status | nuance.asr.v1.Status | Output message indicating the status of the transcription. See the ASRaaS Status documentation for details. |
asr_start_of_speech | nuance.asr.v1.StartOfSpeech | Output message containing the start-of-speech message. See the ASRaaS StartOfSpeech documentation for details. |
This method includes:
StreamOutput
response Standard DLGaaS ExecuteResponse
audio
asr_result
asr_status
asr_start_of_speech
StopRequest
Request object used by Stop method.
Field | Type | Description |
---|---|---|
session_id | string | ID for the session. |
user_id | string | Identifies a specific user within the application. See User ID. |
This method includes:
StopRequest
session_id
user_id
StopResponse
Response object used by the Stop method. Currently empty; reserved for future use.
This method includes:
StopResponse
Fields reference
AsrParamsV1
Parameters to be forwarded to the ASR service. See Step 4b. Interact with the user (using audio) for details.
Field | Type | Description |
---|---|---|
audio_format | nuance.asr.v1. AudioFormat | Audio codec type and sample rate. See the ASRaaS AudioFormat documentation for details. |
utterance_detection_mode | nuance.asr.v1. EnumUtteranceDetectionMode | How end of utterance is determined. Defaults to SINGLE. See the ASRaaS EnumUtteranceDetectionMode documentation for details. |
recognition_flags | nuance.asr.v1. RecognitionFlags | Flags to fine tune recognition. See the ASRaaS RecognitionFlags documentation for details. |
result_type | nuance.asr.v1.EnumResultType | Whether final, partial, or immutable results are returned. See the ASRaaS EnumResultType documentation for details. |
no_input_timeout_ms | uint32 | Maximum silence, in ms, allowed while waiting for user input after recognition timers are started. Default (0) means server default, usually no timeout. See the ASRaaS Timers documentation for details. |
recognition_timeout_ms | uint32 | Maximum duration, in ms, of recognition turn. Default (0) means server default, usually no timeout. See the ASRaaS Timers documentation for details. |
utterance_end_silence_ms | uint32 | Minimum silence, in ms, that determines the end of an utterance. Default (0) means server default, usually 500ms or half a second. See the ASRaaS Timers documentation for details. |
speech_detection_sensitivity | float | A balance between detecting speech and noise (breathing, etc.), from 0 to 1. 0 means ignore all noise, 1 means interpret all noise as speech. Default is 0.5. See the ASRaaS Timers documentation for details. |
max_hypotheses | uint32 | Maximum number of n-best hypotheses to return. Default (0) means a server default, usually 10 hypotheses. |
end_stream_no_valid_hypotheses | bool | Determines whether the dialog application or the client application handles the dialog flow when ASRaaS does not return a valid hypothesis. When set to false (default), the dialog flow is determined by the Mix.dialog application, according to the processing defined for the NO_INPUT and NO_MATCH events. To configure the streaming request so that the stream is closed if ASRaaS does not return a valid hypothesis, set to true . See Handling unusable ASR audio for details. |
ContinueAction
Continue action to be performed by the client application.
Field | Type | Description |
---|---|---|
message | Message | Message to be played as part of the continue action. |
view | View | View details for this action. |
data | google.protobuf.Struct | Map of data exchanged in this node. |
id | string | ID identifying the Continue Action node in the dialog application. |
DAAction
Data Access action to be performed by the client application.
Field | Type | Description |
---|---|---|
id | string | ID identifying the Data Access node in the dialog application. |
message | Message | Message to be played as part of the Data Access action. |
view | View | View details for this action. |
data | google.protobuf.Struct | Map of data exchanged in this node. |
DialogEvent
Message used to indicate an event that occurred during the dialog interactions.
Field | Type | Description |
---|---|---|
type | DialogEvent.EventType | Type of event being triggered. |
message | string | Optional message providing additional information about the event. |
event_name | string | Name of custom event. Must be set to the name of the custom event defined in Mix.dialog. See Manage events for details. Applies only when DialogEvent.EventType is set to CUSTOM. |
DialogEvent.EventType
The possible event types that can occur on the client side of interactions.
Name | Number | Description |
---|---|---|
SUCCESS | 0 | Everything went as expected. |
ERROR | 1 | An unexpected problem occurred. |
NO_INPUT | 2 | End user has not provided any input. |
NO_MATCH | 3 | End user provided unrecognizable input. |
HANGUP | 4 | End user has hung up. Currently used for IVR interactions. |
CUSTOM | 5 | Custom event. You must set field event_name in DialogEvent to the name of the custom event defined in Mix.dialog. |
EndAction
End node, indicates that the dialog has ended.
Field | Type | Description |
---|---|---|
data | google.protobuf.Struct | Map of data exchanged in this node. |
id | string | ID identifying the End Action node in the dialog application. |
EscalationAction
Escalation action to be performed by the client application.
Field | Type | Description |
---|---|---|
message | Message | Message to be played as part of the escalation action. |
view | View | View details for this action. |
data | google.protobuf.Struct | Map of data exchanged in this node. |
id | string | ID identifying the External Action node in the dialog application. |
ExecuteRequestPayload
Payload sent with the Execute request. If both an event and a user input are provided, the event has precedence. For example, if an error event is provided, the input will be ignored.
Field | Type | Description |
---|---|---|
user_input | UserInput | Input provided to the Dialog engine. |
dialog_event | DialogEvent | Used to pass in events that can drive the flow. Optional; if an event is not passed, the operation is assumed to be successful. |
requested_data | RequestData | Data that was previously requested by engine. |
ExecuteResponsePayload
Payload returned after the Execute method is called. Specifies the action to be performed by the client application.
Field | Type | Description |
---|---|---|
messages | Message | Repeated. Message action to be performed by the client application. |
qa_action | QAAction | Question and answer action to be performed by the client application. |
da_action | DAAction | Data access action to be performed by the client application. |
escalation_action | EscalationAction | Escalation action to be performed by the client application. |
end_action | EndAction | End action to be performed by the client application. |
continue_action | ContinueAction | Continue action to be performed by the client application. Currently not implemented |
Message
Specifies the message to be played to the user. See Message actions for details.
Field | Type | Description |
---|---|---|
nlg | Message.Nlg | Repeated. Text to be played using Text-to-speech. |
visual | Message.Visual | Repeated. Text to be displayed to the user (for example, in a chat). |
audio | Message.Audio | Repeated. Prompt to be played from an audio file. |
view | View | View details for this message. |
Message.Audio
Field | Type | Description |
---|---|---|
text | string | Text to be used as TTS backup if the audio file cannot be played. |
uri | string | URI to the audio file, in the following format:language/prompts/library/channel/filename?version=version For example: en-US/prompts/default/Omni_Channel_VA/Message_ini_01.wav?version=1.0_1602096507331 |
mask | bool | When set to true, indicates that the text contains sensitive data that will be masked in logs. |
barge_in_disabled | bool | When set to true, indicates that barge-in is disabled. |
Message.Nlg
Field | Type | Description |
---|---|---|
text | string | Text to be played using Text-to-speech. |
mask | bool | When set to true, indicates that the text contains sensitive data that will be masked in logs. |
barge_in_disabled | bool | When set to true, indicates that barge-in is disabled. |
Message.Visual
Field | Type | Description |
---|---|---|
text | string | Text to be displayed to the user (for example, in a chat). |
mask | bool | When set to true, indicates that the text contains sensitive data that will be masked in logs. |
barge_in_disabled | bool | When set to true, indicates that barge-in is disabled. |
QAAction
Question and answer action to be performed by the client application.
Field | Type | Description |
---|---|---|
message | Message | Message to be played as part of the question and answer action. |
data | google.protobuf.Struct | Map of data exchanged in this node. |
view | View | View details for this action. |
selectable | Selectable | Interactive elements to be displayed by the client app, such as clickable buttons or links. See Interactive elements for details. |
recognition_settings | RecognitionSettings | Configuration information to be used during recognition. |
mask | bool | When set to true, indicates that the Question and Answer node is meant to collect an entity that will hold sensitive data to be masked in logs. |
RecognitionSettings
Configuration information to be used during recognition.
Field | Type | Description |
---|---|---|
dtmf_mappings | DtmfMapping | Array of DTMF mappings configured in Mix.dialog. |
collection_settings | CollectionSettings | Collection settings configured in Mix.dialog. |
speech_settings | SpeechSettings | Speech settings configured in Mix.dialog. |
dtmf_settings | DtmfSettings | DTMF settings configured in Mix.dialog. |
RecognitionSettings.CollectionSettings
Collection settings configured in Mix.dialog.
Field | Type | Description |
---|---|---|
timeout | string | Time, in ms, to wait for speech once a prompt has finished playing before throwing a NO_INPUT event. |
complete_timeout | string | Duration of silence, in ms, to determine the user has finished speaking. The timer starts when the recognizer has a well-formed hypothesis. |
incomplete_timeout | string | Duration of silence, in ms, to determine the user has finished speaking. The timer starts when the user stops speaking. |
max_speech_timeout | string | Maximum duration, in ms, of an utterance collected from the user. |
RecognitionSettings.DtmfMapping
DTMF mappings configured in Mix.dialog. See Set DTMF mappings for details.
Field | Type | Description |
---|---|---|
id | string | ID of the entity to which the DTMF mapping applies. |
value | string | Entity value to map to a DTMF key. |
dtmf_key | string | DTMF key associated with this entity value. Valid values are: 0-9, *, # |
RecognitionSettings.DtmfSettings
DTMF settings configured in Mix.dialog.
Field | Type | Description |
---|---|---|
inter_digit_timeout | string | Maximum time, in ms, allowed between each DTMF character entered by the user. |
term_timeout | string | Maximum time, in ms, to wait for an additional DTMF character before terminating the input. |
term_char | string | Character that terminates a DTMF input. |
RecognitionSettings.SpeechSettings
Speech settings configured in Mix.dialog.
Field | Type | Description |
---|---|---|
sensitivity | string | Level of sensitivity to speech. 1.0 means highly sensitive to quiet input, while 0.0 means least sensitive to noise. |
barge_in_type | string | Barge-in type; possible values: "speech" (interrupt a prompt by using any word) and "hotword" (interrupt a prompt by using a specific hotword). |
speed_vs_accuracy | string | Desired balance betweemrn speed and accuracy. 0.0 means fastest recognition, while 1.0 means best accuracy. |
RequestData
Data that was requested by the dialog application.
Field | Type | Description |
---|---|---|
id | string | ID used by the dialog application to identify which node requested the data. |
data | google.protobuf.Struct | Map of keys to json objects of the data requested. |
ResourceReference
Reference object of the resource to use for the request (for example, URN or URL of the model)
Field | Type | Description |
---|---|---|
uri | string | Reference (for example, the URL or URN). |
type | ResourceReference. EnumResourceType | Type of resource. |
ResourceReference.EnumResourceType
Name | Number | Description |
---|---|---|
APPLICATION_MODEL | 0 | Dialog application model. |
Selectable
Interactive elements to be displayed by the client app, such as clickable buttons or links. See Interactive elements for details.
Field | Type | Description |
---|---|---|
selectable_items | Selectable.SelectableItem | Repeated. List of interactive elements. |
Selectable.SelectableItem
Field | Type | Description |
---|---|---|
value | Selectable.SelectableItem. SelectedValue | Key-value pairs of available options for interactive element. |
description | string | Description of the interactive element. |
display_text | string | Text to display for this interactive element. |
display_image_uri | string | URI of image to display for this interactive element. |
Selectable.SelectableItem.SelectedValue
Field | Type | Description |
---|---|---|
id | string | ID of option. |
value | string | Value of option. |
Selector
Provides channel and language used for the conversation. See Selectors for details.
Field | Type | Description |
---|---|---|
channel | string | Optional: Channel that this conversation is going to use (for example, WebVA). Note: Replace any spaces or slashes in the name of the channel profile with the underscore character (_). |
language | string | Optional: Language to use for this conversation. |
library | string | Optional: Library to use for this conversation. Advanced customization reserved for future use. Always use the default value for now, which is default . |
StartRequestPayload
Payload sent with the Start request.
Field | Type | Description |
---|---|---|
model_ref | ResourceReference | Reference object of the resource to use for the request. |
data | google.protobuf.Struct | Map of data sent in the request. |
suppress_log_user_data | bool | Set to true to disable logging for ASR, NLU, TTS, and Dialog. |
StartResponsePayload
Payload returned after the Start method is called. If a session ID is not provided in the request, a new one is generated and should be used for subsequent calls.
Field | Type | Description |
---|---|---|
session_id | string | Returns session ID to use for subsequent calls. |
TtsParamsv1
Parameters to be forwarded to the TTS service. See Step 4b. Interact with the user (using audio) for details.
Field | Type | Description |
---|---|---|
audio_params | nuance.tts.v1. AudioParameters |
Output audio parameters, such as encoding and volume. See the TTSaaS AudioParameters documentation for details. |
voice | nuance.tts.v1.Voice | The voice to use for audio synthesis. See the TTSaaS Voice documentation for details. |
UserInput
Provides input to the Dialog engine. The client application sends either the text collected from the user, to be interpreted by Mix, or an interpretation that was performed externally.
Note: Provide only one of the following fields.
Field | Type | Description |
---|---|---|
user_text | string | Text collected from end user. |
interpretation | UserInput.Interpretation | Interpretation that was done externally (for example, Nuance Recognizer for VoiceXML). This can be used for simple interpretations that include entities with string values only. Use nluaas_interpretation for interpretations that include complex entities. |
selected_item | Selectable.SelectableItem. SelectedValue |
Value of element selected by end user. |
nluaas_interpretation | nuance.nlu.v1.InterpretResult | Interpretation that was done externally (for example, Nuance Recognizer for VoiceXML), provided in the NLUaaS format. See Interpreting text user input for an example. Note that DLGaaS currently only supports single intent interpretations. |
UserInput.Interpretation
Sends interpretation data.
Field | Type | Description |
---|---|---|
confidence | float | Required: Value from 0..1 that indicates the confidence of the interpretation. |
input_mode | string | Optional: Input mode. Current values are dtmf/voice (but input mode not limited to these). |
utterance | string | Raw collected text. |
data | UserInput.Interpretation. DataEntry |
Repeated. Data from the interpretation of intents and entities. For example, INTENT:BILL_PAY or or AMOUNT:100. |
slot_literals | UserInput.Interpretation. SlotLiteralsEntry |
Repeated. Slot literals from the interpretation of the entities. The slot literal provides the exact words used by the user. For example, AMOUNT: One hundred dollars. |
slot_confidences | UserInput.Interpretation. SlotConfidencesEntry |
Repeated. Slot confidences from the interpretation of the entities. |
alternative_interpretations | UserInput.Interpretation | Repeated. Alternative interpretations possible from the interaction, that is, n-best list. |
UserInput.Interpretation.DataEntry
Field | Type | Description |
---|---|---|
key | string | Key of the data. |
value | string | Value of the data. |
UserInput.Interpretation.SlotConfidencesEntry
Field | Type | Description |
---|---|---|
key | string | Name of the entity. |
value | float | Value from 0..1 that indicates the confidence of the interpretation for this entity. |
UserInput.Interpretation.SlotLiteralsEntry
Field | Type | Description |
---|---|---|
key | string | Name of the entity. |
value | string | Literal value of the entity. |
View
Specifies view details for this action.
Field | Type | Description |
---|---|---|
id | string | ID of the view. |
name | string | Name of the view. |
Scalar Value Types
Change log
2021-01-13
- The RecognitionSettings field of QA action now includes new fields to show settings configured in Mix.dialog:
- The CLIENT_ID example was updated to show latest Mix syntax.
To use these features:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-12-14
- The userData predefined variable section shows how to send the userData predefined variable to the dialog application in the StartRequest payload.
- The nlg, visual, and audio messages now include two new fields,
mask
andbarge_in_disabled
. - The QA action now includes a new field,
mask
.
To use these features:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-10-28
- The Simple variable types section describes the new variable types that can be set in Mix.dialog and shows how to send them to the dialog application in a data access node.
- The QA action now includes a new field,
RecognitionSettings
, that includes DTMF mappings configured in Mix.dialog.
To use this feature:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-10-08
Added more information about URIs for audio files.
2020-09-16
- The obsolete API versions (v1beta1 and v1beta2) were removed from the documentation.
- The UserInput message now includes a new field,
nluaas_interpretation
, to provide interpretations in the NLUaaS format. See Interpreting text user input for an example. Note that DLGaaS currently only supports single intent interpretations.
To use this feature:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-09-03
- The AsrParamsV1 message now contains the
end_stream_no_valid_hypotheses
field to close the stream when no valid hypotheses is returned by ASRaaS. See Handling unusable ASR audio for details. - The StartRequest now includes a new field,
client_data
, to inject data in call logs. - The following ASR parameters can now be set in the AsrParamsV1 message:
no_input_timeout_ms
recognition_timeout_ms
utterance_end_silence_ms
speech_detection_sensitivity
max_hypotheses
To use these features:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-08-26
- Noted in Selectors to replace any spaces or slashes in the name of the channel profile with the underscore character (_).
2020-07-22
- Added more information about Transfer actions.
2020-07-09
- Versions v1beta1 and v1beta2 of the DLGaaS API are now obsolete.
2020-06-24
- Custom events are now supported. The DialogEvent.EventType field supports a new type,
CUSTOM
, and the custom event name can be set in fieldevent_name
of DialogEvent.
To use this feature:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-05-28
- The StartRequest, ExecuteRequest, and StopRequest now include a new field,
user_id
, which identifies a specific user. See UserID for details. - The ASR proto files were renamed from nuance_asr*.proto to recognizer.proto, resource.proto, and result.proto.
To use these features:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-05-14
- Added information about data sent in a question and answer action.
2020-05-13
- The StreamOutput method contains two new fields:
asr_status
, to provide the status of the transcription.asr_start_of_speech
, to provide start-of-speech message.
To use these features:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-04-30
- The Interpretation message contains a new field,
slot_confidences
, to provide the confidence values for entities. - The escalation action, end action, and continue action now include an ID that identifies the node in the dialog application.
- The TtsParamsv1 contains a new field,
voice
, that lets you specify the voice to use for audio synthesis.
To use these features:
- Download the latest version of the proto files.
- Generate the client stubs from the proto files as described in gRPC setup.
2020-04-15
- Added Status messages and codes
- Added an example for using interactive elements
- Provided additional information about nodes and actions
2020-03-31
First release of this new version.