Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
A breakdown of how our pricing works
At Millis AI, we designed our pricing to be transparent and straightforward. Here’s a breakdown of how our pricing works, including the costs associated with using different Large Language Models (LLMs), Text-to-Speech (TTS), and Speech-to-Text (STT) providers. Our token offers a comprehensive and value-driven approach for users seeking to maximize efficiency and reduce costs. Token holders gain access to exclusive discounts on platform fees, including charges for LLMs, Text-to-Speech, and Speech-to-Text services. Beyond covering operational expenses, the token provides added benefits such as priority access to new features, enhanced support services, and the ability to unlock premium functionalities. By holding our token, users not only enjoy preferential rates for seamless integration and optimized interactions but also become part of a forward-thinking ecosystem designed to deliver long-term value and innovation. Check more at Tokenomics.
We charge a base rate of $0.02 per minute for using the Millis AI platform. This rate is in addition to the fees charged by other providers such as LLM, TTS, and STT services.
The pricing for LLMs is based on the number of tokens processed. Here, we translate token-based pricing into an estimated cost per minute based on typical usage:
GPT-4o:
Input: $5.00 per 1 million tokens
Output: $15.00 per 1 million tokens
Estimated costs per minute: $0.004 per minute
GPT-4 Turbo:
Input: $10.00 per 1 million tokens
Output: $30.00 per 1 million tokens
Estimated costs per minute: $0.008 per minute
GPT-3.5 Turbo:
Input: $0.50 per 1 million tokens
Output: $1.50 per 1 million tokens
Estimated costs per minute: $0.0004 per minute
Meta Llama-3:
Input: $0.90 per 1 million tokens
Output: $0.90 per 1 million tokens
Estimated costs per minute: $0.00018 per minute
Your LLM: No charge for using your own custom LLM
Choose By Millis - Optimize for Best Latency:
Millis automatically selects the best available LLM model from above with the lowest latency for your specific configuration and functions.
This option is perfect for users looking for the most efficient performance without the need to manually switch between models.
TTS pricing is based on the number of characters or words that the agent speaks. This means longer responses from your agent will cost more. Each provider has its own pricing structure:
Eleven Labs: $$0.15 / 1,000 characters (~ $0.12/min based on average speech length)
OpenAI: $$0.015 / 1,000 characters (~ $0.012/min)
Rime: $$0.075 / 1,000 characters (~ $0.06/min)
A typical minute of voice interaction might involve processing around 200 tokens (800 characters).
For converting spoken inputs into text, we charge $0.0043 per minute.
Let’s calculate the total cost for a 10-minute session using GPT-4o and Eleven Labs TTS, with an average speech rate. Assuming each party speaks for approximately 5 minutes during the 10-minute session:
Millis AI base charge: 10 min x 0.02/min=0.02/min=0.20
GPT-4o LLM charge: 10 min x 0.004/min=0.004/min=0.04
Eleven Labs TTS charge: Since the agent only speaks for about half of the session (5 minutes), and assuming an average speech rate translates to 4,000 characters spoken by the agent:
4,000 characters x 0.15/1,000characters=0.15/1,000characters=0.60
STT charge: Since the human also speaks for approximately half of the session (5 minutes):
5 minutes x 0.0043/min=0.0043/min=0.0215
Total cost: 0.20(base)+0.20(base)+0.04 (LLM) + 0.60(TTS)+0.60(TTS)+0.0215 (STT) = $0.8615
Millis AI Webhooks Documentation
Configuration Key in Agent Config: extra_prompt_webhook
The Prefetch Data Webhook is called before the conversation begins. It enables real-time customization and event notifications.
The configuration can be provided in one of the following formats:
String: The URL of the webhook endpoint.
Object:
Call Notification: Notify your system about an incoming call by receiving a webhook event.
Metadata Override: Dynamically override or add session metadata based on external systems.
Extra Prompt: Provide additional context to the agent’s system prompt.
Method: GET
Query Parameters:
session_id
(string): Unique session identifier.
agent_id
(string): ID of the agent handling the session.
from
(string): Caller’s phone number (if applicable).
to
(string): Receiver’s phone number (if applicable).
Additional session metadata as key-value pairs (if available).
Content-Type: application/json
.
Response Fields:
metadata
(object): Overrides or extends the current session metadata. Existing keys are updated, and new keys are added.
extra_prompt
(string): Additional text appended to the agent’s system prompt.
Configuration Key in Agent Config: session_data_webhook
The End-of-Call Webhook is triggered after a session concludes. It provides detailed information about the session for logging, analytics, or post-call processing.
The configuration can be provided in one of the following formats:
String: The URL of the webhook endpoint.
Object:
Method: POST
Payload (JSON):
chat
(string): JSON string representing the chat history (user and agent messages).
function_calls
(array): List of external functions invoked during the session.
ts
(timestamp): Session start time.
duration
(integer): Session duration in seconds.
agent_config
(object): Serialized agent configuration.
agent_id
(string): ID of the agent managing the session.
call_id
(string): Unique identifier for the call session.
chars_used
(integer): Number of characters processed during the session.
session_id
(string): Unique session identifier.
cost_breakdown
(array): List of cost components for different services.
voip
(object): Telephony details.
recording
(object): Call recording details.
metadata
(object): Final metadata for the session.
call_status
(string): Status of the call. Possible values:
user-ended
api-ended
voicemail-message
voicemail-hangup
agent-ended
timeout
error
chat_completion
error_message
(string): Description of any errors encountered during the session.
Request URL:
Payload:
Prefetch Webhook:
Use this webhook to notify your system about calls, adjust metadata dynamically, or provide extra prompts to enhance conversational context.
End-of-Call Webhook:
Designed for detailed post-call analytics and tracking.
Ensure secure storage of sensitive information like call recordings and metadata.
Security:
Use HTTPS for all webhooks.
Authenticate requests with headers or API keys as needed.
These webhooks provide a robust way to integrate Millis AI into your existing systems, offering flexibility and advanced customization for voice agent interactions.
Voice Agents are the core components of the Millis AI platform. These agents can be customized to perform a variety of tasks, from answering questions to guiding users through complex processes.
It’s important to configure your voice agent to ensure it operates effectively within your specific context. Here are the main aspects you can customize:
The system prompt is where you can provide specific instructions or information that the agent needs to remember and follow. This sets the initial context for your voice agent, guiding its responses and interactions.
(Optional) If not set, the default Millis AI model is used.
Model: Specifies the GPT model that your agent will operate on. We support OpenAI’s latest model, GPT-4o, as well as open-source models like Meta Llama 3.
Provider: The provider who provides inference for the model.
provider: The service provider for the text-to-speech service. This config determines the quality of your agent’s voice.
voice_d: The specific voice character from the chosen provider’s catalog, allowing you to customize how your agent sounds.
Defines the operational language of the agent. If not specified, English is used by default.
(Optional) If you prefer using your own custom LLM, specify a WebSocket URL to enable this connection.
A list of function calls the agent can execute to perform tasks or retrieve information during interactions. This includes API webhooks and other integrations.
You can select the AI model for your voice agent based on your needs:
Default Millis AI Model: Automatically used if no specific LLM model is provided. This model is best optimized for low latency.
Popular Models from Providers: Like OpenAI’s GPT-4o, known for the best language processing capabilities but with a trade-off in latency.
Custom Model via WebSocket: Integrate your uniquely developed or tailored LLM to give your agent specialized abilities. You have full control over the agent’s capabilities.
Functions are additional capabilities that you can integrate into your voice agents to enhance their utility and interaction dynamics.
These allow the agent to perform actions or retrieve information during a conversation by calling external APIs. This is useful for tasks like booking appointments, fetching user-specific data, or updating records in real-time.
Implement a function where the agent can prompt users to fill out a web form during a conversation. This is particularly useful for gathering detailed information or when textual input is more practical than voice. For example: Email, phone number, name, etc.
Webhook functions include the following components:
Function and Parameter Naming: Ensure that the function name and parameter names are formatted as valid identifiers. They should have no spaces, begin with a letter, and can include underscores or use camelCase, such as “get_email” or “getEmail”.
Descriptions Provide comprehensive details in your function and parameter descriptions to help the agent understand what the function is for and when to use it.
Web form functions allow your voice agent to trigger web forms on browser during conversation for data collection or user input. This is particularly useful for gathering detailed information or when textual input is more practical than voice. For example: Email, phone number, name, etc.
Once you have defined your function, you can integrate it into your agent’s configuration. Add the function to the tools array in your agent config.
Agent Configurations
Prompt:
LLM (Large Language Model):
Voice Settings:
Language:
Custom LLM WebSocket:
Tools:
API Webhooks
Web Form Trigger
Millis AI's token is designed to fuel the platform's growth while offering tangible benefits to users and stakeholders. Below is a detailed breakdown of the tokenomics structure:
Token Utility
Exclusive Discounts: Token holders can access reduced rates for LLM, TTS, and STT services, as well as the base platform fee.
Payment Method: Use tokens to pay for all services on the platform, from integration support to advanced analytics.
Premium Access: Unlock advanced features like priority processing, extended language support, and customizable voice fine-tuning.
Incentives: Reward developers and early adopters for contributions like training new models, identifying bugs, and providing feedback.
Governance: Token holders gain voting rights on platform upgrades, new feature implementation, and future development priorities.
Token Allocation
Total Supply: 1,000,000,000 tokens
Development Team: 10% (100,000,000 tokens)
Reserved for platform maintenance, infrastructure scaling, and ongoing innovation.
Vesting period: 12 months, with linear release over 24 months to ensure alignment with long-term goals.
Token Release Schedule
Community Rewards: Gradual distribution over 5 years to incentivize sustained participation.
Ecosystem Growth Fund: Released based on project milestones and performance metrics.
Liquidity: Locked for the first 6 months, with gradual release thereafter.
Benefits for Holders
Reduced costs for platform services.
Access to premium features and priority support.
Governance participation to shape Millis AI's future.
Long-term value through ecosystem expansion and innovation.
The token is not just a utility but a cornerstone of the Millis AI ecosystem, fostering collaboration, innovation, and growth.
Attach metadata to your call sessions for personalized conversations
Millis AI allows you to attach metadata to your call sessions when using the Web SDK or starting outbound calls. Metadata can include any user-specific data, such as the caller’s name, user ID, or other relevant information. The system provides multiple ways to utilize this metadata during the session.
When using the embeddable call widget, you can add metadata directly through URL parameters:
Any query parameters added to the widget URL will automatically become metadata for that session.
You can add metadata to a call using the Web SDK by providing the metadata as the second parameter in the start
method. The third parameter, include_metadata_in_prompt
, determines whether the metadata should be included in the agent’s system prompt.
The first parameter is the agentId
or a temporary agent configuration.
The second parameter is the metadata you want to attach to the session.
The third parameter is include_metadata_in_prompt
, which controls whether the metadata is used by the agent during the conversation.
Learn more about our Web SDK here.
You can also attach metadata by initiating a session through a WebSocket connection. Include the metadata in the initiate
method payload.
Learn more about building native apps using websocket here.
To add metadata using the outbound API, you need to include it in the request body when calling the start_outbound_call
API. You can also specify whether to include the metadata in the agent’s prompt.
from_phone
: One of your agent’s phone numbers.
to_phone
: The phone number of the call recipient.
metadata
: Optional. Any extra data you want to attach to the session.
include_metadata_in_prompt
: Optional. Set to true
to include metadata in the agent’s prompt. Defaults to false
.
Learn more about Outbound Call here.
The metadata will stay associated with that session throughout its lifecycle. This metadata will also be included in the following:
Prefetch Data Webhook: Metadata is forwarded during the session’s prefetch data webhook which you can use to retrieve personalized data.
End of Call Webhook: Metadata is passed along at the conclusion of the call, allowing you to track and identify sessions.
Learn more about the webhooks here.
Any metadata you attach to a call session can be used as dynamic variables throughout your call flow. Variables can be referenced in agent prompts, messages, webhook parameters, and function calls using the {variableName}
syntax.
For example, if you add metadata like:
You can reference these values using {userName}
or {userType}
in various places during the call.
For detailed information about using variables, including syntax and examples, see our Variables documentation.
Allow agent to continue past conversations with users
Millis AI’s Session Continuation feature enables agents to leverage previous interaction data, allowing users to continue conversations seamlessly from prior sessions. By passing a session_id
, agents can access past context, enhancing engagement and providing a personalized experience for users.
Warning: Session Continuation is only available if Data Opt-Out is disabled. When Data Opt-Out is enabled, Millis does not retain call history, so agent can’t retrieve data from previous sessions. Ensure that Data Opt-Out is disabled if you require session continuation for your users.
session_id
for Session ContinuationTo enable session continuation using the Millis Web SDK, use the msClient.start
method:
Make sure you upgrade your web sdk to v1.0.15 to have this option.
To continue a session via WebSocket, include the session_id
in the initiate
event:
For outbound calls, the session_id
is included in the request body when calling the start_outbound_call
API.
Example Request
Follow-Up Customer Support Calls: A returning customer can pick up from a previous conversation by including the session_id
, reducing the need to re-explain their issue.
Ongoing Campaigns: In multi-stage campaigns, session_id
helps track each caller’s journey, creating a more cohesive experience.
Consultations: Advisors can use session continuation to reference past discussions, fostering a more personalized relationship.
User Identification: Upcoming features will allow sessions to continue based on user_id
, phone
, or similar identifiers, adding memory across sessions without requiring a session_id
.
Millis AI now supports running voice agents in various regions, including the EU, to help reduce latency for calls in those regions and ensure compliance with EU laws. This documentation will guide you through the process of selecting and configuring regions for your voice agents.
There are two primary ways to set the region for your voice agents in Millis:
Set Region for Your Phone Number When importing phone numbers from Twilio or Vonage, you can choose the desired region. This setting will ensure that calls are routed through the selected region, optimizing latency and compliance.
Set Region Endpoint When Starting Your Agent from Web SDK You can specify the region endpoint when initializing your agent using the Millis Web SDK. Use the following code to set the region:
Replace <your-millis-public-key>
with your actual Millis public key and <region-endpoint>
with the endpoint of the desired region.
us-west: wss://api-west.millis.ai/millis
eu-west: wss://api-eu-west.millis.ai/millis
More endpoints will be available soon.
Integrate Millis AI’s voice agent capabilities directly into your web applications and browser extensions.
Install the SDK with npm:
Here’s how to quickly set up a voice agent in your web application:
Obtain your public key from your Millis AI Playground
Learn more about which endPoint
to use HERE.
Starting from version 1.0.15, use the following format to initiate a call:
Using a Predefined Agent
First, create a voice agent from the Playground. Then, start a conversation with your agent using the following code:
Replace agent-id
with the ID of your agent obtained from the Playground.
The metadata
is optional. You can pass any additional data to the session, which we will forward to your custom LLM and function webhooks. If you provide metadata
, you can make it available to the agent by setting include_metadata_in_prompt
to true
. This will include the metadata in the agent’s system prompt, allowing the agent to use the data during the conversation.
Dynamically Creating a Temporary Voice Agent
You can also dynamically create a temporary voice agent with custom configurations using the code below:
To obtain the voice_id, use this API to acquire the complete list of voices: https://api-west.millis.ai:8080/voices
Overriding Agent configuration
When both agent_id
and agent_config
are provided, the session will use the configuration associated with agent_id
but will override it with any settings provided in agent_config
. This option allows for minor modifications to the agent’s default configuration on a per-session basis.
Additional Parameters
Metadata
metadata
: Optional field to pass any additional information that may personalize the conversation. It can be used by the agent if include_metadata_in_prompt
is set to true
.
Including Metadata in Prompt
include_metadata_in_prompt
: Boolean flag (true
or false
). If true
, the metadata provided will be included in the prompt to give context to the agent.
Session Continuation
session_continuation
: Provide session_id
from a previous session to enable continuity in conversation. This allows the agent to reference previous interactions.
onopen
Description: Emitted when the WebSocket connection is successfully opened.
Callback Signature:
onready
Description: Emitted when the client is ready to start processing audio or other tasks.
Callback Signature:
onsessionended
Description: Emitted when a session has ended.
Callback Signature:
onaudio
Description: Emitted when audio data is received.
Callback Signature:
Parameters:
audio
- The received audio data in Uint8Array
format.
onresponsetext
Description: Emitted when agent’s response text is received.
Callback Signature:
Parameters:
text
- The received response text.
payload
- An object containing additional information.
is_final
(optional) - A boolean indicating if the response text is final.
ontranscript
Description: Emitted when user’s transcript text is received.
Callback Signature:
Parameters:
text
- The received transcript text.
payload
- An object containing additional information.
is_final
(optional) - A boolean indicating if the transcript text is final.
onfunction
Description: Emitted when agent triggered a function call.
Callback Signature:
Parameters:
text
- Empty.
payload
- Information about the triggered function.
name
- The function name.
params
- The params being used in the function call.
analyzer
Description: Emitted with an AnalyserNode
for agent’s audio analysis.
Callback Signature:
Parameters:
analyzer
- The AnalyserNode
used for audio analysis.
useraudioready
Description: Emitted when user audio is ready for processing.
Callback Signature:
Parameters:
data
- An object containing audio-related information.
analyser
- The AnalyserNode
for user’s audio analysis.
stream
- The MediaStream
containing the user’s audio data.
onlatency
Description: Emitted to report latency information for debugging purpose.
Callback Signature:
Parameters:
latency
- The measured latency in milliseconds.
onclose
Description: Emitted when the WebSocket connection is closed.
Callback Signature:
Parameters:
event
- The CloseEvent
containing details about the WebSocket closure.
onerror
Description: Emitted when an error occurs in the WebSocket connection.
Callback Signature:
Parameters:
error
- The Event
containing details about the error.
Here’s an example of how to listen to these events in the Client
class:
If you encounter any issues or have questions, please reach out to us directly at thach@millis.ai.
Integrating a Custom LLM with Millis AI Voice Agent
This guide describes how to integrate your own LLM chatbot with a Millis AI voice agent. By connecting your custom LLM, you can power the voice agent with your chatbot’s capabilities, providing a seamless voice interaction experience based on your model’s responses.
Setup a websocket server on your end.
When an outbound or inbound call is initiated with your voice agent, the Millis AI server will establish a connection to your specified WebSocket URL.
Here’s how the interaction flows after connection established:
Millis AI server will send start_call
event to tell your server when the conversation starts.
Millis AI streams the user’s spoken message, including the full conversation transcript, to your LLM.
Your LLM processes the transcript and streams back the response. Indicate the end of a message stream with end_of_stream
.
flush
: Set this to true
to instruct the agent to immediately generate audio based on the current response. If false
, the agent will buffer the response and generate audio only when it receives a complete sentence.
pause
: Set this to a number of milliseconds to instruct the agent to pause for that long after saying the response before saying the next response.
When your LLM generates a response, attach the stream_id
from the original request so that we can keep track of which response corresponds to which request.
For the first message that your server sends after receiving the start_call
event, use the stream_id
from the start_call
event.
Your custom LLM can send specific messages to control the flow of the call. Instead of sending stream_response
, you can send the following types:
To terminate the call:
To transfer the call to another destination (e.g., phone number):
Parameters:
stream_id
: The unique identifier for the stream.
destination
: The phone number or endpoint to transfer the call to.
Millis AI manages the conversation flow, including interruption detection and end-of-turn signals. You will be notified of these events:
Description: Sent to provide a partial transcript of the conversation. The transcript can be either final or partial.
Message Structure:
Parameters:
session_id
: The unique identifier for the session.
transcript
: The partial or complete transcript text.
is_final
: Boolean indicating whether the transcript is final.
Description: Sent when the playback of agent’s audio stream has finished.
Message Structure:
Parameters:
session_id
: The unique identifier for the session.
stream_id
: The unique identifier for the stream.
Description: Sent when user interrupts agent’s stream.
Message Structure:
Parameters:
stream_id
: The unique identifier for the stream.
In your voice agent’s configuration on the Millis AI platform, specify your WebSocket endpoint.
Millis AI is an advanced voice AI platform that helps builders/developers quickly build low latency, natural-sounding voice agents at low cost.
Low Latency Interaction: Experience smooth, natural dialogues with groundbreaking 600ms latency, nearly matching the gold standard for conversational response times.
Natural Conversation Flow: Our voice agents are built to handle complex conversational dynamics, including interruptions and human intent recognition, ensuring realistic and fluid interactions.
Easy Integration: Integrate voice agents effortlessly into your projects with minimal coding required. Choose from our proprietary models or connect your custom LLM-based chatbot for rapid deployment.
Scalable Infrastructure: Benefit from the expertise of our DevOps engineers who have scaled systems to support hundreds of millions of video call minutes daily, guaranteeing a robust, enterprise-grade infrastructure.
English, Bulgarian, Catalan, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese.
Trigger an outbound call to a specific number
This API allows you to initiate outbound phone calls from a specified Millis AI voice agent to any given phone number.
Method: POST
URL:
Headers:
Body:
Ensure the phone number includes the full international dialing format (e.g., +15555555555), with no dashes or spaces. (Only US phone nubers are supported)
Example Request using curl
Let Millis AI handle your inbound calls in twilio
Learn how to setup your twilio phone number so such that the call gets handle by a voice agent from Millis AI specified by you.
Create a TwiML Bin with the following configuration. Replace public_key and agent_id with your own.
In your phone number setting, select Webhook, TwiML Bin, Function, Studio Flow, Proxy Service
Select Twiml Bin and the name of your TwiML Bin for A call comes in
section.
In your phone number setting, select Webhook, TwiML Bin, Function, Studio Flow, Proxy Service
Select Webhook and your own backend endpoint for A call comes in
Handle the webhook as you like and return the following as your response.
Connecting Phone Numbers to Millis via SIP
Millis provides seamless integration for connecting your phone system to its AI-powered voice agents using SIP. This guide walks you through the process of rerouting phone calls to Millis via SIP.
To initiate a call, send a POST request to the /register_sip_call
API endpoint. Depending on your location, select either the EU-West or US-West region for lower latency:
EU-West: https://api-eu-west.millis.ai/register_sip_call
US-West: https://api-west.millis.ai/register_sip_call
You must include the necessary parameters in your request body, with the option to customize agent behavior and include metadata if needed.
Request Body Example:
Field Details:
agent_id
: (Optional) The ID of the Millis AI agent that will handle the call.
agent_config
: (Optional) Configuration options for the agent, allowing you to customize behavior.
If both agent_id
and agent_config
are provided, the parameters in agent_config
will override the original parameters for the agent tied to agent_id
.
You can also provide just agent_config
for a temporary configuration, which will be used to construct an agent to handle the call.
include_metadata_in_prompt
: (Optional) Boolean value indicating if the metadata should be included in the agent’s conversational prompt.
After making the POST request, you will receive a response containing a call_id
and a sip_uri
. This sip_uri
is the address you will use to route your phone calls to Millis.
Response Example:
Use the provided sip_uri
to reroute the call from your phone system to Millis. Your phone system will forward the call audio to Millis, where the voice agent can interact with the caller.
Connect audio sources directly to Millis agents via WebRTC
Millis AI supports WebRTC integration, allowing users to connect their audio sources directly to Millis agents via WebRTC. This integration is ideal for a variety of applications, including:
Phone systems with WebRTC capabilities
Voice agents in video conferencing platforms (e.g., Zoom, Google Meet) to interact with participants via voice
VoIP systems that leverage WebRTC for real-time communication
Millis AI enables users to build intelligent voice agents that can join these platforms, engage in conversations, and assist participants via voice interactions.
To connect your phone system, video conferencing platform, or VoIP solution to Millis AI via WebRTC, ensure that:
Your system supports WebRTC for audio transmission.
You have a valid agent_id to route calls or streams to the correct Millis agent.
You have a private key for authenticating requests to Millis.
To initiate a WebRTC session, your system sends a WebRTC offer to Millis through the following API endpoint:
This API is used to send the WebRTC offer to Millis, where it will be processed, and a WebRTC answer will be returned to complete the connection.
Authorization: Bearer token containing the private key to authenticate the request.
Content-Type: application/json
The request body contains the following fields:
If the offer is valid, the API will respond with a WebRTC answer that can be used to complete the connection between your system and Millis.
Set the Remote Description: Use the sdp
from the answer to set the remote description on your WebRTC client.
Example (JavaScript/WebRTC):
Complete ICE Candidate Exchange: Ensure that ICE candidates are exchanged between your client and Millis to establish the media path.
Start Media Transmission: After completing SDP and ICE negotiations, audio will start flowing between your system and the Millis agent.
Route incoming calls from DID numbers through Millis AI for real-time voice interaction, using WebRTC for media transmission.
Connect voice agents to video conferencing platforms such as Zoom, Google Meet, and others. The voice agent can join calls and engage in real-time audio conversations with participants, providing support, answering questions, or automating workflows.
Millis agents can be connected to virtual communication rooms via WebRTC, interacting with users in the room to provide assistance, answer questions, or drive conversations through audio.
Embedding a Voice Agent Call Widget into Your Web Application
Millis AI offers a simple and effective way to integrate voice interaction into your web applications through our embeddable call widget. This widget allows users to interact with your voice agent directly from your website, providing a seamless user experience.
The widget includes a button to start and stop interactions and features an animation of an audiogram to visually represent the audio interaction similar to our demo page.
Navigate to the voice agent you want to embed.
Click on the “Actions” button on the top right and select ‘Embed to public site’.
Copy the provided HTML code.
With the HTML code, you can place it anywhere in your web app to embed the widget. Here’s how to do it on Webflow:
Navigate to the designated area and add a “Code Embed” component.
Paste the HTML code provided above, then click ‘Save’.
You can customize each widget session by adding URL parameters that will be passed as metadata. This allows you to provide context-specific information to your voice agent.
You can add metadata parameters like this:
Adding user identification information
Passing context about the page or section where the widget is embedded
Providing custom configuration parameters for the conversation
Remember that any metadata added via URL parameters will be visible in the URL. Don’t include sensitive information this way.
Create your Voice Agent on the
Your endpoint should be capable of both receiving messages from and sending messages to the Millis AI server. Here’s a .
If you need help or have any questions, please .
Prerequisites
Obtain your API key from your .
API Details
Retrieve your voice agent ID from the Agent Details page on the playground.
Prerequisites
Obtain your public_key from your .
Setup a voice agent and obatain the agent_id from your .
Method 1 : TwiML Bin
Method 2 : Use your own endpoint
metadata
: (Optional) Any additional information to attach to the session, giving the agent context to enhance its response. To learn more about how metadata works, visit the .
offer
Object:
Example Request:
Example Response:
Step 1: Create your voice agent
Step 2. Obtain the widget embeddable code
Step 3: Embed the widget into your web application
These parameters will be automatically converted into metadata for the session, allowing your agent to access this information during the conversation. You can reference this metadata using in your agent’s prompts or function calls.
Learn more about how metadata works in Millis AI .
agent_id
String
The ID of the Millis agent to handle the call or stream.
offer
Object
The WebRTC offer details, containing sdp
and type
.
sdp
String
The WebRTC offer’s session description protocol (SDP).
type
String
Type of the WebRTC request (typically “offer”).
Dynamic Variables in Call Sessions
Variables are pieces of data that can be dynamically inserted during call sessions. They can now be embedded within different parts of a call flow, including prompts, agent messages, webhook parameters, and function calls. This flexibility allows developers to build richer and more responsive interactions for their agents.
In every call, there are predefined variables like FromPhone
and ToPhone
(representing the originating and receiving phone numbers for phone calls). Additionally, you can utilize any key-value pairs from the metadata that you provide for the call.
For example, if you include custom information like customerID
, appointmentTime
, or any other data that you may want to reference throughout the call, these can now be easily used.
Millis supports a simple syntax for using variables. Just wrap the variable name in curly braces like {<variable>}
, and our system will automatically replace it with the corresponding value during the call. Here are a few examples:
To fetch the caller info by number: Use {FromPhone}
as the description of the function’s param.”
To greet the caller by name via metadata: Set agent’s greeting line to be “Hi {userName}! How can I assist you today?
.”
Webhook parameters: You can add variables as parameters in a webhook, such as phone_number={FromPhone}
.
Millis offers both default variables and dynamic variables for customizing call sessions:
Default Variables (Special Variables): These are system-provided variables such as FromPhone
and ToPhone
, which represent the caller’s phone number and the receiving phone number, respectively, when the call is made via a phone network.
Dynamic Variables: These are custom key-value pairs that you provide as metadata when starting a call. You can use any metadata as a variable within prompts, messages, function calls, or webhook parameters.
For Function Calls: Put {<variable>}
in the description of the parameter. For instance, if the parameter name is phone_number
, you can set the parameter description to {FromPhone}
to automatically include the caller’s phone number.
For Agent Messages: Add {userName: <real user name>}
to metadata, then set the agent’s greeting line to: Hi {userName}! How can I help you?
. This can also be applied to the agent’s prompt.
Using Millis Platform via WebSocket to build voice agents on desktop and mobile
This tutorial guides you through the process of integrating the Millis AI platform directly via WebSocket to build voice agents for desktop or mobile apps. Users can capture audio natively and send it to Millis via WebSocket, receiving voice responses in real-time.
Create your Voice Agent on the Playground
Use native APIs on desktop or mobile to capture and playback audio.
Establish websocket connection to Millis server.
WebSocket Endpoint: wss://api-west.millis.ai:8080/millis
Sample Rate: 16000 Hz
Encoding: PCM
Channels: 1
Chunk Size: Any
Begin by establishing a connection with the Millis AI WebSocket endpoint. Here’s an example code in javascript.
Once connected, send an initiate message to start the interaction.
Millis will respond with the message {"method": "onready"}
indicating readiness.
Capture audio on your device and send it as an ArrayBuffer to Millis. Make sure it’s an Uint8Array
.
Note: Audio packets should be in PCM format, 16000 Hz sample rate, and mono (1 channel).
Millis will send audio responses as ArrayBuffers with the same format and sample rate. You need to buffer and play these on your side.
ArrayBuffer data will be the audio packets, while string data indicates normal events that you need to process accordingly.
Send a {"method": "ping"}
message every 1000 packets to keep the connection alive.
Millis may send various events to manage the session and interaction. Here is the logic behind each message:
pause: Millis detected some voice activity from the client. The agent decides to temporarily pause talking and observe the next voice activity. In this case, you should still keep and buffer incoming audio packets but not play them.
unpause: If Millis detects that it’s not the human trying to talk over or interrupt, the agent will continue talking. In this case, you should continue playing audio packets in the buffer.
clear: Millis detected human’s voice, indicating human interruption intent. The agent will reset and stay silent to let the human continue talking. In this case, clear all audio buffers and stop playback.
ontranscript: Real-time transcript of the client’s audio.
onresponsetext: Real-time transcript of the agent’s response.
onsessionended: For any reason Millis decides to end the session, you will receive this event.
start_answering: The agent decides to start answering the human’s query.
ai_action: For debug purposes. During the conversation, Millis AI intelligently decides to take some action. Listen to this event to understand what the agent is trying to do.
Example:
Simply close the WebSocket connection to stop the conversation.