Integrating a Custom LLM with Millis AI Voice Agent
This guide describes how to integrate your own LLM chatbot with a Millis AI voice agent. By connecting your custom LLM, you can power the voice agent with your chatbot’s capabilities, providing a seamless voice interaction experience based on your model’s responses.
Create your Voice Agent on the Playground
Setup a websocket server on your end.
When an outbound or inbound call is initiated with your voice agent, the Millis AI server will establish a connection to your specified WebSocket URL.
Your endpoint should be capable of both receiving messages from and sending messages to the Millis AI server. Here’s a Sample Code.
Here’s how the interaction flows after connection established:
Millis AI server will send start_call
event to tell your server when the conversation starts.
Millis AI streams the user’s spoken message, including the full conversation transcript, to your LLM.
Your LLM processes the transcript and streams back the response. Indicate the end of a message stream with end_of_stream
.
flush
: Set this to true
to instruct the agent to immediately generate audio based on the current response. If false
, the agent will buffer the response and generate audio only when it receives a complete sentence.
pause
: Set this to a number of milliseconds to instruct the agent to pause for that long after saying the response before saying the next response.
When your LLM generates a response, attach the stream_id
from the original request so that we can keep track of which response corresponds to which request.
For the first message that your server sends after receiving the start_call
event, use the stream_id
from the start_call
event.
Your custom LLM can send specific messages to control the flow of the call. Instead of sending stream_response
, you can send the following types:
To terminate the call:
To transfer the call to another destination (e.g., phone number):
Parameters:
stream_id
: The unique identifier for the stream.
destination
: The phone number or endpoint to transfer the call to.
Millis AI manages the conversation flow, including interruption detection and end-of-turn signals. You will be notified of these events:
partial_transcript
Description: Sent to provide a partial transcript of the conversation. The transcript can be either final or partial.
Message Structure:
Parameters:
session_id
: The unique identifier for the session.
transcript
: The partial or complete transcript text.
is_final
: Boolean indicating whether the transcript is final.
playback_finished
Description: Sent when the playback of agent’s audio stream has finished.
Message Structure:
Parameters:
session_id
: The unique identifier for the session.
stream_id
: The unique identifier for the stream.
interrupt
Description: Sent when user interrupts agent’s stream.
Message Structure:
Parameters:
stream_id
: The unique identifier for the stream.
In your voice agent’s configuration on the Millis AI platform, specify your WebSocket endpoint.