Custom LLM
Integrating a Custom LLM with Millis AI Voice Agent
Last updated
Integrating a Custom LLM with Millis AI Voice Agent
Last updated
This guide describes how to integrate your own LLM chatbot with a Millis AI voice agent. By connecting your custom LLM, you can power the voice agent with your chatbot’s capabilities, providing a seamless voice interaction experience based on your model’s responses.
Create your Voice Agent on the
Setup a websocket server on your end.
When an outbound or inbound call is initiated with your voice agent, the Millis AI server will establish a connection to your specified WebSocket URL.
Your endpoint should be capable of both receiving messages from and sending messages to the Millis AI server. Here’s a .
Here’s how the interaction flows after connection established:
Millis AI server will send start_call
event to tell your server when the conversation starts.
Millis AI streams the user’s spoken message, including the full conversation transcript, to your LLM.
Your LLM processes the transcript and streams back the response. Indicate the end of a message stream with end_of_stream
.
flush
: Set this to true
to instruct the agent to immediately generate audio based on the current response. If false
, the agent will buffer the response and generate audio only when it receives a complete sentence.
pause
: Set this to a number of milliseconds to instruct the agent to pause for that long after saying the response before saying the next response.
Your custom LLM can send specific messages to control the flow of the call. Instead of sending stream_response
, you can send the following types:
To terminate the call:
To transfer the call to another destination (e.g., phone number):
Parameters:
stream_id
: The unique identifier for the stream.
destination
: The phone number or endpoint to transfer the call to.
Millis AI manages the conversation flow, including interruption detection and end-of-turn signals. You will be notified of these events:
Description: Sent to provide a partial transcript of the conversation. The transcript can be either final or partial.
Message Structure:
Parameters:
session_id
: The unique identifier for the session.
transcript
: The partial or complete transcript text.
is_final
: Boolean indicating whether the transcript is final.
Description: Sent when the playback of agent’s audio stream has finished.
Message Structure:
Parameters:
session_id
: The unique identifier for the session.
stream_id
: The unique identifier for the stream.
Description: Sent when user interrupts agent’s stream.
Message Structure:
Parameters:
stream_id
: The unique identifier for the stream.
In your voice agent’s configuration on the Millis AI platform, specify your WebSocket endpoint.