1 of 21

Millis AI

Overview

Introduction

Millis AI is an advanced voice AI platform that helps builders/developers quickly build low latency, natural-sounding voice agents at low cost.

Key Features and Capabilities

Low Latency Interaction: Experience smooth, natural dialogues with groundbreaking 600ms latency, nearly matching the gold standard for conversational response times.
Natural Conversation Flow: Our voice agents are built to handle complex conversational dynamics, including interruptions and human intent recognition, ensuring realistic and fluid interactions.
Easy Integration: Integrate voice agents effortlessly into your projects with minimal coding required. Choose from our proprietary models or connect your custom LLM-based chatbot for rapid deployment.
Scalable Infrastructure: Benefit from the expertise of our DevOps engineers who have scaled systems to support hundreds of millions of video call minutes daily, guaranteeing a robust, enterprise-grade infrastructure.

English, Bulgarian, Catalan, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Lithuanian, Malay, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese.

Millis AI Pricing Overview

A breakdown of how our pricing works

At Millis AI, we designed our pricing to be transparent and straightforward. Here’s a breakdown of how our pricing works, including the costs associated with using different Large Language Models (LLMs), Text-to-Speech (TTS), and Speech-to-Text (STT) providers. Our token offers a comprehensive and value-driven approach for users seeking to maximize efficiency and reduce costs. Token holders gain access to exclusive discounts on platform fees, including charges for LLMs, Text-to-Speech, and Speech-to-Text services. Beyond covering operational expenses, the token provides added benefits such as priority access to new features, enhanced support services, and the ability to unlock premium functionalities. By holding our token, users not only enjoy preferential rates for seamless integration and optimized interactions but also become part of a forward-thinking ecosystem designed to deliver long-term value and innovation. Check more at Tokenomics.

Price Breakdown

Base Charge

We charge a base rate of $0.02 per minute for using the Millis AI platform. This rate is in addition to the fees charged by other providers such as LLM, TTS, and STT services.

LLM Model Pricing

The pricing for LLMs is based on the number of tokens processed. Here, we translate token-based pricing into an estimated cost per minute based on typical usage:

GPT-4o:
- Input: $5.00 per 1 million tokens
- Output: $15.00 per 1 million tokens
- Estimated costs per minute: $0.004 per minute
GPT-4 Turbo:
- Input: $10.00 per 1 million tokens
- Output: $30.00 per 1 million tokens
- Estimated costs per minute: $0.008 per minute
GPT-3.5 Turbo:
- Input: $0.50 per 1 million tokens
- Output: $1.50 per 1 million tokens
- Estimated costs per minute: $0.0004 per minute
Meta Llama-3:
- Input: $0.90 per 1 million tokens
- Output: $0.90 per 1 million tokens
- Estimated costs per minute: $0.00018 per minute
Your LLM: No charge for using your own custom LLM
Choose By Millis - Optimize for Best Latency:
- Millis automatically selects the best available LLM model from above with the lowest latency for your specific configuration and functions.
- This option is perfect for users looking for the most efficient performance without the need to manually switch between models.

TTS Provider Pricing

TTS pricing is based on the number of characters or words that the agent speaks. This means longer responses from your agent will cost more. Each provider has its own pricing structure:

Eleven Labs: $$0.15 / 1,000 characters (~ $0.12/min based on average speech length)
OpenAI: $$0.015 / 1,000 characters (~ $0.012/min)
Rime: $$0.075 / 1,000 characters (~ $0.06/min)

A typical minute of voice interaction might involve processing around 200 tokens (800 characters).

STT Pricing

For converting spoken inputs into text, we charge $0.0043 per minute.

Example Calculation

Let’s calculate the total cost for a 10-minute session using GPT-4o and Eleven Labs TTS, with an average speech rate. Assuming each party speaks for approximately 5 minutes during the 10-minute session:

Millis AI base charge: 10 min x 0.02/min=0.02/min=0.20
GPT-4o LLM charge: 10 min x 0.004/min=0.004/min=0.04
Eleven Labs TTS charge: Since the agent only speaks for about half of the session (5 minutes), and assuming an average speech rate translates to 4,000 characters spoken by the agent:
- 4,000 characters x 0.15/1,000characters=0.15/1,000characters=0.60
STT charge: Since the human also speaks for approximately half of the session (5 minutes):
- 5 minutes x 0.0043/min=0.0043/min=0.0215
Total cost: 0.20(base)+0.20(base)+0.04 (LLM) + 0.60(TTS)+0.60(TTS)+0.0215 (STT) = $0.8615

Tokenomics

Millis AI's token is designed to fuel the platform's growth while offering tangible benefits to users and stakeholders. Below is a detailed breakdown of the tokenomics structure:

Token Utility

Exclusive Discounts: Token holders can access reduced rates for LLM, TTS, and STT services, as well as the base platform fee.
Payment Method: Use tokens to pay for all services on the platform, from integration support to advanced analytics.
Premium Access: Unlock advanced features like priority processing, extended language support, and customizable voice fine-tuning.
Incentives: Reward developers and early adopters for contributions like training new models, identifying bugs, and providing feedback.
Governance: Token holders gain voting rights on platform upgrades, new feature implementation, and future development priorities.

Token Allocation

Total Supply: 1,000,000,000 tokens
Development Team: 10% (100,000,000 tokens)
- Reserved for platform maintenance, infrastructure scaling, and ongoing innovation.
- Vesting period: 12 months, with linear release over 24 months to ensure alignment with long-term goals.

Token Release Schedule

Community Rewards: Gradual distribution over 5 years to incentivize sustained participation.
Ecosystem Growth Fund: Released based on project milestones and performance metrics.
Liquidity: Locked for the first 6 months, with gradual release thereafter.

Benefits for Holders

Reduced costs for platform services.
Access to premium features and priority support.
Governance participation to shape Millis AI's future.
Long-term value through ecosystem expansion and innovation.

The token is not just a utility but a cornerstone of the Millis AI ecosystem, fostering collaboration, innovation, and growth.

Core Concepts

Agent

Voice Agents are the core components of the Millis AI platform. These agents can be customized to perform a variety of tasks, from answering questions to guiding users through complex processes.

It’s important to configure your voice agent to ensure it operates effectively within your specific context. Here are the main aspects you can customize:

The system prompt is where you can provide specific instructions or information that the agent needs to remember and follow. This sets the initial context for your voice agent, guiding its responses and interactions.

(Optional) If not set, the default Millis AI model is used.

Model: Specifies the GPT model that your agent will operate on. We support OpenAI’s latest model, GPT-4o, as well as open-source models like Meta Llama 3.
Provider: The provider who provides inference for the model.

provider: The service provider for the text-to-speech service. This config determines the quality of your agent’s voice.
voice_d: The specific voice character from the chosen provider’s catalog, allowing you to customize how your agent sounds.

Defines the operational language of the agent. If not specified, English is used by default.

(Optional) If you prefer using your own custom LLM, specify a WebSocket URL to enable this connection.

A list of function calls the agent can execute to perform tasks or retrieve information during interactions. This includes API webhooks and other integrations.

You can select the AI model for your voice agent based on your needs:

Default Millis AI Model: Automatically used if no specific LLM model is provided. This model is best optimized for low latency.
Popular Models from Providers: Like OpenAI’s GPT-4o, known for the best language processing capabilities but with a trade-off in latency.
Custom Model via WebSocket: Integrate your uniquely developed or tailored LLM to give your agent specialized abilities. You have full control over the agent’s capabilities.

Functions for Agents

Functions are additional capabilities that you can integrate into your voice agents to enhance their utility and interaction dynamics.

Function Types:

These allow the agent to perform actions or retrieve information during a conversation by calling external APIs. This is useful for tasks like booking appointments, fetching user-specific data, or updating records in real-time.

Implement a function where the agent can prompt users to fill out a web form during a conversation. This is particularly useful for gathering detailed information or when textual input is more practical than voice. For example: Email, phone number, name, etc.

Webhook functions include the following components:

Function and Parameter Naming: Ensure that the function name and parameter names are formatted as valid identifiers. They should have no spaces, begin with a letter, and can include underscores or use camelCase, such as “get_email” or “getEmail”.
Descriptions Provide comprehensive details in your function and parameter descriptions to help the agent understand what the function is for and when to use it.

Web form functions allow your voice agent to trigger web forms on browser during conversation for data collection or user input. This is particularly useful for gathering detailed information or when textual input is more practical than voice. For example: Email, phone number, name, etc.

Once you have defined your function, you can integrate it into your agent’s configuration. Add the function to the tools array in your agent config.

Webhooks

Millis AI Webhooks Documentation

1. Prefetch Data Webhook

Configuration Key in Agent Config: extra_prompt_webhook

The Prefetch Data Webhook is called before the conversation begins. It enables real-time customization and event notifications.

The configuration can be provided in one of the following formats:

String: The URL of the webhook endpoint.
Object:

Call Notification: Notify your system about an incoming call by receiving a webhook event.
Metadata Override: Dynamically override or add session metadata based on external systems.
Extra Prompt: Provide additional context to the agent’s system prompt.

Method: GET
Query Parameters:
- session_id (string): Unique session identifier.
- agent_id (string): ID of the agent handling the session.
- from (string): Caller’s phone number (if applicable).
- to (string): Receiver’s phone number (if applicable).
- Additional session metadata as key-value pairs (if available).

Content-Type: application/json.
Response Fields:
- metadata (object): Overrides or extends the current session metadata. Existing keys are updated, and new keys are added.
- extra_prompt (string): Additional text appended to the agent’s system prompt.

Configuration Key in Agent Config: session_data_webhook

The End-of-Call Webhook is triggered after a session concludes. It provides detailed information about the session for logging, analytics, or post-call processing.

The configuration can be provided in one of the following formats:

String: The URL of the webhook endpoint.
Object:

Method: POST
Payload (JSON):
- chat (string): JSON string representing the chat history (user and agent messages).
- function_calls (array): List of external functions invoked during the session.
- ts (timestamp): Session start time.
- duration (integer): Session duration in seconds.
- agent_config (object): Serialized agent configuration.
- agent_id (string): ID of the agent managing the session.
- call_id (string): Unique identifier for the call session.
- chars_used (integer): Number of characters processed during the session.
- session_id (string): Unique session identifier.
- cost_breakdown (array): List of cost components for different services.
- voip (object): Telephony details.
- recording (object): Call recording details.
- metadata (object): Final metadata for the session.
- call_status (string): Status of the call. Possible values:
  - user-ended
  - api-ended
  - voicemail-message
  - voicemail-hangup
  - agent-ended
  - timeout
  - error
  - chat_completion
- error_message (string): Description of any errors encountered during the session.

Request URL:

Payload:

Prefetch Webhook:
- Use this webhook to notify your system about calls, adjust metadata dynamically, or provide extra prompts to enhance conversational context.
End-of-Call Webhook:
- Designed for detailed post-call analytics and tracking.
- Ensure secure storage of sensitive information like call recordings and metadata.
Security:
- Use HTTPS for all webhooks.
- Authenticate requests with headers or API keys as needed.

These webhooks provide a robust way to integrate Millis AI into your existing systems, offering flexibility and advanced customization for voice agent interactions.

Attach metadata to call

Attach metadata to your call sessions for personalized conversations

Millis AI allows you to attach metadata to your call sessions when using the Web SDK or starting outbound calls. Metadata can include any user-specific data, such as the caller’s name, user ID, or other relevant information. The system provides multiple ways to utilize this metadata during the session.

How to Add Metadata

When using the embeddable call widget, you can add metadata directly through URL parameters:

https://app.millis.ai/agents/embedded?id=XXXX&k=YYYYYY&userName=John&userType=premium

Any query parameters added to the widget URL will automatically become metadata for that session.

2. Using the Web SDK

You can add metadata to a call using the Web SDK by providing the metadata as the second parameter in the start method. The third parameter, include_metadata_in_prompt, determines whether the metadata should be included in the agent’s system prompt.

msClient.start(<agent-id>, metadata?: object, include_metadata_in_prompt?: boolean);

The first parameter is the agentId or a temporary agent configuration.
The second parameter is the metadata you want to attach to the session.
The third parameter is include_metadata_in_prompt, which controls whether the metadata is used by the agent during the conversation.

Learn more about our Web SDK here.

3. Using the native Websocket connection

You can also attach metadata by initiating a session through a WebSocket connection. Include the metadata in the initiate method payload.

{
  "method": "initiate",
  "data": {
    "agent": {
      "agent_id": "your_agent_id",
      Or replace with agent_config: <config> for dynamic configuration
    },
    "public_key": "your_public_key",
    "metadata": {
      "key": "value"
    },
    "include_metadata_in_prompt": true
  }
}

Learn more about building native apps using websocket here.

4. Using the Outbound API

To add metadata using the outbound API, you need to include it in the request body when calling the start_outbound_call API. You can also specify whether to include the metadata in the agent’s prompt.

{
  "from_phone": "one of your agent's phone number",
  "to_phone": "receiver's phone number",
  "metadata": {
    "key": "value"
  },
  "include_metadata_in_prompt": true
}

from_phone: One of your agent’s phone numbers.
to_phone: The phone number of the call recipient.
metadata: Optional. Any extra data you want to attach to the session.
include_metadata_in_prompt: Optional. Set to true to include metadata in the agent’s prompt. Defaults to false.

Learn more about Outbound Call here.

How to retrieve and use Metadata

The metadata will stay associated with that session throughout its lifecycle. This metadata will also be included in the following:

Prefetch Data Webhook: Metadata is forwarded during the session’s prefetch data webhook which you can use to retrieve personalized data.
End of Call Webhook: Metadata is passed along at the conclusion of the call, allowing you to track and identify sessions.

Learn more about the webhooks here.

Use Metadata as Variables

Any metadata you attach to a call session can be used as dynamic variables throughout your call flow. Variables can be referenced in agent prompts, messages, webhook parameters, and function calls using the {variableName} syntax.

For example, if you add metadata like:

{
  "userName": "John",
  "userType": "premium"
}

You can reference these values using {userName} or {userType} in various places during the call.

For detailed information about using variables, including syntax and examples, see our Variables documentation.

Dynamic Variables

Dynamic Variables in Call Sessions

What Are Variables?

Variables are pieces of data that can be dynamically inserted during call sessions. They can now be embedded within different parts of a call flow, including prompts, agent messages, webhook parameters, and function calls. This flexibility allows developers to build richer and more responsive interactions for their agents.

In every call, there are predefined variables like FromPhone and ToPhone (representing the originating and receiving phone numbers for phone calls). Additionally, you can utilize any key-value pairs from the metadata that you provide for the call.

For example, if you include custom information like customerID, appointmentTime, or any other data that you may want to reference throughout the call, these can now be easily used.

How to Use Variables

Millis supports a simple syntax for using variables. Just wrap the variable name in curly braces like {<variable>}, and our system will automatically replace it with the corresponding value during the call. Here are a few examples:

To fetch the caller info by number: Use {FromPhone} as the description of the function’s param.”
To greet the caller by name via metadata: Set agent’s greeting line to be “Hi {userName}! How can I assist you today?.”
Webhook parameters: You can add variables as parameters in a webhook, such as phone_number={FromPhone}.

Default and Dynamic Variables

Millis offers both default variables and dynamic variables for customizing call sessions:

Default Variables (Special Variables): These are system-provided variables such as FromPhone and ToPhone, which represent the caller’s phone number and the receiving phone number, respectively, when the call is made via a phone network.
Dynamic Variables: These are custom key-value pairs that you provide as metadata when starting a call. You can use any metadata as a variable within prompts, messages, function calls, or webhook parameters.

Examples of How to Use Variables

For Function Calls: Put {<variable>} in the description of the parameter. For instance, if the parameter name is phone_number, you can set the parameter description to {FromPhone} to automatically include the caller’s phone number.
For Agent Messages: Add {userName: <real user name>} to metadata, then set the agent’s greeting line to: Hi {userName}! How can I help you?. This can also be applied to the agent’s prompt.

Session Continuation

Allow agent to continue past conversations with users

Millis AI’s Session Continuation feature enables agents to leverage previous interaction data, allowing users to continue conversations seamlessly from prior sessions. By passing a session_id, agents can access past context, enhancing engagement and providing a personalized experience for users.

Important Note on Data Opt-Out

Warning: Session Continuation is only available if Data Opt-Out is disabled. When Data Opt-Out is enabled, Millis does not retain call history, so agent can’t retrieve data from previous sessions. Ensure that Data Opt-Out is disabled if you require session continuation for your users.

How to Pass `session_id` for Session Continuation

1. Web SDK Integration

To enable session continuation using the Millis Web SDK, use the msClient.start method:

msClient.start({
  agent: {
    agent_id: agentId
  },
  session_continuation: {
    session_id: "<previous session id>"
  }
});

Make sure you upgrade your web sdk to v1.0.15 to have this option.

2. Native Integration via WebSocket

To continue a session via WebSocket, include the session_id in the initiate event:

{
  "method": "initiate",
  "data": {
    "agent": {
      "agent_id": "your_agent_id",
      // or for dynamic configuration
      "agent_config": "<config>"
    },
    "public_key": "your_public_key",
    "metadata": {
      "key": "value"
    },
    "include_metadata_in_prompt": true,
    "session_continuation": {
      "session_id": "<previous session id>"
    }
  }
}

3. Outbound Call API Integration

For outbound calls, the session_id is included in the request body when calling the start_outbound_call API.

Example Request

{
  "from_phone": "your_agent_phone_number",
  "to_phone": "receiver_phone_number",
  "metadata": {
    "key": "value"
  },
  "include_metadata_in_prompt": true,
  "session_continuation": {
    "session_id": "<previous session id>"
  }
}

Example Scenarios

Follow-Up Customer Support Calls: A returning customer can pick up from a previous conversation by including the session_id, reducing the need to re-explain their issue.
Ongoing Campaigns: In multi-stage campaigns, session_id helps track each caller’s journey, creating a more cohesive experience.
Consultations: Advisors can use session continuation to reference past discussions, fostering a more personalized relationship.

Future Enhancements

User Identification: Upcoming features will allow sessions to continue based on user_id, phone, or similar identifiers, adding memory across sessions without requiring a session_id.

Running Voice Agents in Different Regions

Millis AI now supports running voice agents in various regions, including the EU, to help reduce latency for calls in those regions and ensure compliance with EU laws. This documentation will guide you through the process of selecting and configuring regions for your voice agents.

Selecting Regions

There are two primary ways to set the region for your voice agents in Millis:

Set Region for Your Phone Number When importing phone numbers from Twilio or Vonage, you can choose the desired region. This setting will ensure that calls are routed through the selected region, optimizing latency and compliance.
Set Region Endpoint When Starting Your Agent from Web SDK You can specify the region endpoint when initializing your agent using the Millis Web SDK. Use the following code to set the region:

const msClient = Millis.createClient({
  publicKey: '<your-millis-public-key>',
  endPoint: '<region-endpoint>'
});

Replace <your-millis-public-key> with your actual Millis public key and <region-endpoint> with the endpoint of the desired region.

List of Available Endpoints

us-west: wss://api-west.millis.ai/millis
eu-west: wss://api-eu-west.millis.ai/millis

More endpoints will be available soon.

Integration Guides

Web SDK

Integrate Millis AI’s voice agent capabilities directly into your web applications and browser extensions.

Installation

Install the SDK with npm:

npm install @millisai/web-sdk

Usage

Here’s how to quickly set up a voice agent in your web application:

1. Import the SDK:

import Millis from '@millisai/web-sdk';

2. Initialize the Client:

const msClient = Millis.createClient({publicKey: 'your_public_key', endPoint?: 'region-based-endpoint'});

Obtain your public key from your Millis AI Playground

Learn more about which endPoint to use HERE.

3. Start a Conversation:

Starting from version 1.0.15, use the following format to initiate a call:

msClient.start({
  agent: {
    agent_id: agentId,          // Optionally pass agent_id
    agent_config: {}            // Optionally pass agent_config
  },
  metadata: {},                 // Optional metadata for personalized context
  include_metadata_in_prompt: true/false,  // Optional flag to include metadata in prompt
  session_continuation: { // Optional session ID for continuation
    session_id: "<previous session id>"
  }
});

Using a Predefined Agent

First, create a voice agent from the Playground. Then, start a conversation with your agent using the following code:

msClient.start({
  agent: {
    agent_id: <agent-id>
  }
});

Replace agent-id with the ID of your agent obtained from the Playground.

The metadata is optional. You can pass any additional data to the session, which we will forward to your custom LLM and function webhooks. If you provide metadata, you can make it available to the agent by setting include_metadata_in_prompt to true. This will include the metadata in the agent’s system prompt, allowing the agent to use the data during the conversation.

Dynamically Creating a Temporary Voice Agent

You can also dynamically create a temporary voice agent with custom configurations using the code below:

msClient.start({
  agent: {
    agent_config: {
      prompt: "You're a helpful assistant.", // Example prompt
      voice: {
        provider: "elevenlabs", // Voice provider
        voice_id: "voice-id" // Replace 'voice-id' with the ID of the desired voice
      },
      language: "<language_code>", // optional - use language code such as en, es
      tools: [
        {
          name: "get_user_data",
          description: "",
          webhook: "https://...",
          header: {
            "Content-Type": "application/json",
            "Authorization": ""
          },
          params: [
            {
              name: "",
              type: "string" | "number" | "boolean",
              description: "",
              required: true
            }
          ]
        }
      ], // Replace with actual function calls you need
      custom_llm_websocket: "wss://...", // optional - enable custom llm
      llm: "", // optional - choose llm model. Ex: gpt-4o, llama-3-70b
    }
  }
});

To obtain the voice_id, use this API to acquire the complete list of voices: https://api-west.millis.ai:8080/voices

Overriding Agent configuration

When both agent_id and agent_config are provided, the session will use the configuration associated with agent_id but will override it with any settings provided in agent_config. This option allows for minor modifications to the agent’s default configuration on a per-session basis.

Additional Parameters

Metadata

metadata: Optional field to pass any additional information that may personalize the conversation. It can be used by the agent if include_metadata_in_prompt is set to true.

Including Metadata in Prompt

include_metadata_in_prompt: Boolean flag (true or false). If true, the metadata provided will be included in the prompt to give context to the agent.

Session Continuation

session_continuation: Provide session_id from a previous session to enable continuity in conversation. This allows the agent to reference previous interactions.

4. Stop a Conversation:

msClient.stop();

5. Setup event listener:

    msClient.on("onopen", () => {
      // When the client connected to the server
    });

    msClient.on("onready", () => {
      // When the conversation is ready
    });

    msClient.on("onaudio", (audio: Uint8Array) => {
      // Incoming audio chunk
    });

    msClient.on("analyzer", (analyzer: AnalyserNode) => {
      // AnalyserNode that you can use for audio animation
    });

    msClient.on("onclose", (event) => {
      // When the connection is closed
    });

    msClient.on("onerror", (error) => {
      // An error occurred
    });

Event List

`onopen`

Description: Emitted when the WebSocket connection is successfully opened.

Callback Signature:

'onopen': () => void;

`onready`

Description: Emitted when the client is ready to start processing audio or other tasks.

Callback Signature:

'onready': () => void;

`onsessionended`

Description: Emitted when a session has ended.

Callback Signature:

'onsessionended': () => void;

`onaudio`

Description: Emitted when audio data is received.

Callback Signature:

'onaudio': (audio: Uint8Array) => void;

Parameters:

audio - The received audio data in Uint8Array format.

`onresponsetext`

Description: Emitted when agent’s response text is received.

Callback Signature:

'onresponsetext': (text: string, payload: { is_final?: boolean }) => void;

Parameters:

text - The received response text.
payload - An object containing additional information.
- is_final (optional) - A boolean indicating if the response text is final.

`ontranscript`

Description: Emitted when user’s transcript text is received.

Callback Signature:

'ontranscript': (text: string, payload: { is_final?: boolean }) => void;

Parameters:

text - The received transcript text.
payload - An object containing additional information.
- is_final (optional) - A boolean indicating if the transcript text is final.

`onfunction`

Description: Emitted when agent triggered a function call.

Callback Signature:

'onfunction': (text: string, payload: { name: string, params: object }) => void;

Parameters:

text - Empty.
payload - Information about the triggered function.
- name - The function name.
- params - The params being used in the function call.

`analyzer`

Description: Emitted with an AnalyserNode for agent’s audio analysis.

Callback Signature:

'analyzer': (analyzer: AnalyserNode) => void;

Parameters:

analyzer - The AnalyserNode used for audio analysis.

`useraudioready`

Description: Emitted when user audio is ready for processing.

Callback Signature:

'useraudioready': (data: { analyser: AnalyserNode, stream: MediaStream }) => void;

Parameters:

data - An object containing audio-related information.
- analyser - The AnalyserNode for user’s audio analysis.
- stream - The MediaStream containing the user’s audio data.

`onlatency`

Description: Emitted to report latency information for debugging purpose.

Callback Signature:

'onlatency': (latency: number) => void;

Parameters:

latency - The measured latency in milliseconds.

`onclose`

Description: Emitted when the WebSocket connection is closed.

Callback Signature:

'onclose': (event: CloseEvent) => void;

Parameters:

event - The CloseEvent containing details about the WebSocket closure.

`onerror`

Description: Emitted when an error occurs in the WebSocket connection.

Callback Signature:

'onerror': (error: Event) => void;

Parameters:

error - The Event containing details about the error.

Example Usage

Here’s an example of how to listen to these events in the Client class:

const client = new Client(config);

client.on('onopen', () => {
  console.log('WebSocket connection opened.');
});

client.on('onready', () => {
  console.log('Client is ready.');
});

client.on('onsessionended', () => {
  console.log('Session ended.');
});

client.on('onaudio', (audio) => {
  console.log('Audio received:', audio);
});

client.on('onresponsetext', (text, payload) => {
  console.log('Response text:', text, 'Payload:', payload);
});

client.on('ontranscript', (text, payload) => {
  console.log('Transcript:', text, 'Payload:', payload);
});

client.on('analyzer', (analyzer) => {
  console.log('Analyzer node:', analyzer);
});

client.on('useraudioready', (data) => {
  console.log('User audio ready:', data);
});

client.on('onlatency', (latency) => {
  console.log('Latency:', latency);
});

client.on('onclose', (event) => {
  console.log('WebSocket connection closed:', event);
});

client.on('onerror', (error) => {
  console.error('WebSocket error:', error);
});

Support

If you encounter any issues or have questions, please reach out to us directly at thach@millis.ai.

Custom LLM

Integrating a Custom LLM with Millis AI Voice Agent

Basic

This guide describes how to integrate your own LLM chatbot with a Millis AI voice agent. By connecting your custom LLM, you can power the voice agent with your chatbot’s capabilities, providing a seamless voice interaction experience based on your model’s responses.

Setup a websocket server on your end.

When an outbound or inbound call is initiated with your voice agent, the Millis AI server will establish a connection to your specified WebSocket URL.
Here’s how the interaction flows after connection established:

Millis AI server will send start_call event to tell your server when the conversation starts.

Millis AI streams the user’s spoken message, including the full conversation transcript, to your LLM.

Your LLM processes the transcript and streams back the response. Indicate the end of a message stream with end_of_stream.

flush: Set this to true to instruct the agent to immediately generate audio based on the current response. If false, the agent will buffer the response and generate audio only when it receives a complete sentence.
pause: Set this to a number of milliseconds to instruct the agent to pause for that long after saying the response before saying the next response.

When your LLM generates a response, attach the stream_id from the original request so that we can keep track of which response corresponds to which request.
For the first message that your server sends after receiving the start_call event, use the stream_id from the start_call event.

Your custom LLM can send specific messages to control the flow of the call. Instead of sending stream_response, you can send the following types:

To terminate the call:

To transfer the call to another destination (e.g., phone number):

Parameters:

stream_id: The unique identifier for the stream.
destination: The phone number or endpoint to transfer the call to.

Millis AI manages the conversation flow, including interruption detection and end-of-turn signals. You will be notified of these events:

Description: Sent to provide a partial transcript of the conversation. The transcript can be either final or partial.

Message Structure:

Parameters:

session_id: The unique identifier for the session.
transcript: The partial or complete transcript text.
is_final: Boolean indicating whether the transcript is final.

Description: Sent when the playback of agent’s audio stream has finished.

Message Structure:

Parameters:

session_id: The unique identifier for the session.
stream_id: The unique identifier for the stream.

Description: Sent when user interrupts agent’s stream.

Message Structure:

Parameters:

stream_id: The unique identifier for the stream.

In your voice agent’s configuration on the Millis AI platform, specify your WebSocket endpoint.

Build Native Apps with Websocket

Using Millis Platform via WebSocket to build voice agents on desktop and mobile

This tutorial guides you through the process of integrating the Millis AI platform directly via WebSocket to build voice agents for desktop or mobile apps. Users can capture audio natively and send it to Millis via WebSocket, receiving voice responses in real-time.

Requirements

Create your Voice Agent on the Playground
Use native APIs on desktop or mobile to capture and playback audio.
Establish websocket connection to Millis server.

Overview

WebSocket Endpoint: wss://api-west.millis.ai:8080/millis
Sample Rate: 16000 Hz
Encoding: PCM
Channels: 1
Chunk Size: Any

Step-by-Step Guide

1. Establishing a WebSocket Connection

Begin by establishing a connection with the Millis AI WebSocket endpoint. Here’s an example code in javascript.

let ws = new WebSocket("wss://api-west.millis.ai:8080/millis");
ws.binaryType = "arraybuffer";

2. Sending the Initiate Message

Once connected, send an initiate message to start the interaction.

ws.onopen = () => {
  let initiateMessage = {
    method: "initiate",
    data: {
      agent: {
        agent_id: "your_agent_id", // Or replace with agent_config: <config> for dynamic configuration
      }
      public_key: "your_public_key",
      metadata: {
        key: value
      }, // Optional: extra data attached to the call
      include_metadata_in_prompt: true/false // Optional: option to include the metadata in the agent's system prompt
    }
  };
  ws.send(JSON.stringify(initiateMessage));
};

ws.onopen = () => {
  let initiateMessage = {
    method: "initiate",
    data: {
      agent: {
        agent_config: {
          prompt: "",
          voice: { provider: "elevenlabs", voice_id "..." }
        }
      }
      public_key: "your_public_key",
      metadata: {
        key: value
      }, // Optional: extra data attached to the call
      include_metadata_in_prompt: true/false // Optional: option to include the metadata in the agent's system prompt
    }
  };
  ws.send(JSON.stringify(initiateMessage));
};

Millis will respond with the message {"method": "onready"} indicating readiness.

3. Capturing and Sending Audio

Capture audio on your device and send it as an ArrayBuffer to Millis. Make sure it’s an Uint8Array.

function sendAudioPacket(audioData) {
  let audioBuffer = new Uint8Array(audioData);
  ws.send(audioBuffer);
}

Note: Audio packets should be in PCM format, 16000 Hz sample rate, and mono (1 channel).

4. Receiving and Playing Audio Responses

Millis will send audio responses as ArrayBuffers with the same format and sample rate. You need to buffer and play these on your side.

ws.onmessage = (event) => {
  if (event.data instanceof ArrayBuffer) {
    // ArrayBuffer received, handle as audio packets
    let audioResponse = new Uint8Array(event.data);
    // Buffer and play the audio response
  } else {
    // String received, handle as normal events
    let message = JSON.parse(event.data);
    handleIncomingMessage(message);
  }
};

ArrayBuffer data will be the audio packets, while string data indicates normal events that you need to process accordingly.

5. Keeping the Connection Alive

Send a {"method": "ping"} message every 1000 packets to keep the connection alive.

6. Handling Incoming Events from Millis

Millis may send various events to manage the session and interaction. Here is the logic behind each message:

pause: Millis detected some voice activity from the client. The agent decides to temporarily pause talking and observe the next voice activity. In this case, you should still keep and buffer incoming audio packets but not play them.
unpause: If Millis detects that it’s not the human trying to talk over or interrupt, the agent will continue talking. In this case, you should continue playing audio packets in the buffer.
clear: Millis detected human’s voice, indicating human interruption intent. The agent will reset and stay silent to let the human continue talking. In this case, clear all audio buffers and stop playback.
ontranscript: Real-time transcript of the client’s audio.
onresponsetext: Real-time transcript of the agent’s response.
onsessionended: For any reason Millis decides to end the session, you will receive this event.
start_answering: The agent decides to start answering the human’s query.
ai_action: For debug purposes. During the conversation, Millis AI intelligently decides to take some action. Listen to this event to understand what the agent is trying to do.

Example:

function handleIncomingMessage(message) {
  switch (message.method) {
    case "pause":
      // Pause playback and buffer incoming audio packets
      break;
    case "unpause":
      // Resume playback of buffered audio packets
      break;
    case "clear":
      // Clear audio buffer and stop playback
      break;
    case "ontranscript":
      console.log("Client's audio transcript:", message.data);
      break;
    case "onresponsetext":
      console.log("Agent's response transcript:", message.data);
      break;
    case "onsessionended":
      console.log("Session ended.");
      ws.close();
      break;
    case "start_answering":
      console.log("Agent starts answering the query.");
      break;
    case "ai_action":
      console.log("AI Action:", message.data);
      break;
  }
}

7. Closing the Connection

Simply close the WebSocket connection to stop the conversation.

Embeddable Call Widget

Embedding a Voice Agent Call Widget into Your Web Application

Millis AI offers a simple and effective way to integrate voice interaction into your web applications through our embeddable call widget. This widget allows users to interact with your voice agent directly from your website, providing a seamless user experience.

The widget includes a button to start and stop interactions and features an animation of an audiogram to visually represent the audio interaction similar to our demo page.

Navigate to the voice agent you want to embed.
Click on the “Actions” button on the top right and select ‘Embed to public site’.

Copy the provided HTML code.

With the HTML code, you can place it anywhere in your web app to embed the widget. Here’s how to do it on Webflow:

Navigate to the designated area and add a “Code Embed” component.
Paste the HTML code provided above, then click ‘Save’.

You can customize each widget session by adding URL parameters that will be passed as metadata. This allows you to provide context-specific information to your voice agent.

You can add metadata parameters like this:

Adding user identification information
Passing context about the page or section where the widget is embedded
Providing custom configuration parameters for the conversation

Remember that any metadata added via URL parameters will be visible in the URL. Don’t include sensitive information this way.

Inbound call

Let Millis AI handle your inbound calls in twilio

Basic

Learn how to setup your twilio phone number so such that the call gets handle by a voice agent from Millis AI specified by you.

Create a TwiML Bin with the following configuration. Replace public_key and agent_id with your own.

In your phone number setting, select Webhook, TwiML Bin, Function, Studio Flow, Proxy Service
Select Twiml Bin and the name of your TwiML Bin for A call comes in section.

In your phone number setting, select Webhook, TwiML Bin, Function, Studio Flow, Proxy Service
Select Webhook and your own backend endpoint for A call comes in
Handle the webhook as you like and return the following as your response.

Outbound call

Trigger an outbound call to a specific number

Basic

This API allows you to initiate outbound phone calls from a specified Millis AI voice agent to any given phone number.

Method: POST
URL:

Headers:

Body:

Ensure the phone number includes the full international dialing format (e.g., +15555555555), with no dashes or spaces. (Only US phone nubers are supported)

Example Request using curl

WebRTC

Connect audio sources directly to Millis agents via WebRTC

Overview

Millis AI supports WebRTC integration, allowing users to connect their audio sources directly to Millis agents via WebRTC. This integration is ideal for a variety of applications, including:

Phone systems with WebRTC capabilities
Voice agents in video conferencing platforms (e.g., Zoom, Google Meet) to interact with participants via voice
VoIP systems that leverage WebRTC for real-time communication

Millis AI enables users to build intelligent voice agents that can join these platforms, engage in conversations, and assist participants via voice interactions.

To connect your phone system, video conferencing platform, or VoIP solution to Millis AI via WebRTC, ensure that:

Your system supports WebRTC for audio transmission.
You have a valid agent_id to route calls or streams to the correct Millis agent.
You have a private key for authenticating requests to Millis.

To initiate a WebRTC session, your system sends a WebRTC offer to Millis through the following API endpoint:

This API is used to send the WebRTC offer to Millis, where it will be processed, and a WebRTC answer will be returned to complete the connection.

Authorization: Bearer token containing the private key to authenticate the request.
Content-Type: application/json

The request body contains the following fields:

If the offer is valid, the API will respond with a WebRTC answer that can be used to complete the connection between your system and Millis.

Set the Remote Description: Use the sdp from the answer to set the remote description on your WebRTC client.
- Example (JavaScript/WebRTC):
Complete ICE Candidate Exchange: Ensure that ICE candidates are exchanged between your client and Millis to establish the media path.
Start Media Transmission: After completing SDP and ICE negotiations, audio will start flowing between your system and the Millis agent.

Route incoming calls from DID numbers through Millis AI for real-time voice interaction, using WebRTC for media transmission.

Connect voice agents to video conferencing platforms such as Zoom, Google Meet, and others. The voice agent can join calls and engage in real-time audio conversations with participants, providing support, answering questions, or automating workflows.

Millis agents can be connected to virtual communication rooms via WebRTC, interacting with users in the room to provide assistance, answer questions, or drive conversations through audio.

SIP Trunking

Connecting Phone Numbers to Millis via SIP

Overview

Millis provides seamless integration for connecting your phone system to its AI-powered voice agents using SIP. This guide walks you through the process of rerouting phone calls to Millis via SIP.

To initiate a call, send a POST request to the /register_sip_call API endpoint. Depending on your location, select either the EU-West or US-West region for lower latency:

EU-West: https://api-eu-west.millis.ai/register_sip_call
US-West: https://api-west.millis.ai/register_sip_call

You must include the necessary parameters in your request body, with the option to customize agent behavior and include metadata if needed.

Request Body Example:

Field Details:

agent_id: (Optional) The ID of the Millis AI agent that will handle the call.
agent_config: (Optional) Configuration options for the agent, allowing you to customize behavior.
- If both agent_id and agent_config are provided, the parameters in agent_config will override the original parameters for the agent tied to agent_id.
- You can also provide just agent_config for a temporary configuration, which will be used to construct an agent to handle the call.
include_metadata_in_prompt: (Optional) Boolean value indicating if the metadata should be included in the agent’s conversational prompt.

After making the POST request, you will receive a response containing a call_id and a sip_uri. This sip_uri is the address you will use to route your phone calls to Millis.

Response Example:

Use the provided sip_uri to reroute the call from your phone system to Millis. Your phone system will forward the call audio to Millis, where the voice agent can interact with the caller.

Web SDK

Integrate Millis AI’s voice agent capabilities directly into your web applications and browser extensions.

Installation

Install the SDK with npm:

npm install @millisai/web-sdk

Usage

Here’s how to quickly set up a voice agent in your web application:

1. Import the SDK:

import Millis from '@millisai/web-sdk';

2. Initialize the Client:

const msClient = Millis.createClient({publicKey: 'your_public_key', endPoint?: 'region-based-endpoint'});

Obtain your public key from your Millis AI Playground

Learn more about which endPoint to use HERE.

3. Start a Conversation:

Starting from version 1.0.15, use the following format to initiate a call:

msClient.start({
  agent: {
    agent_id: agentId,          // Optionally pass agent_id
    agent_config: {}            // Optionally pass agent_config
  },
  metadata: {},                 // Optional metadata for personalized context
  include_metadata_in_prompt: true/false,  // Optional flag to include metadata in prompt
  session_continuation: { // Optional session ID for continuation
    session_id: "<previous session id>"
  }
});

Using a Predefined Agent

First, create a voice agent from the Playground. Then, start a conversation with your agent using the following code:

msClient.start({
  agent: {
    agent_id: <agent-id>
  }
});

Replace agent-id with the ID of your agent obtained from the Playground.

Dynamically Creating a Temporary Voice Agent

You can also dynamically create a temporary voice agent with custom configurations using the code below:

msClient.start({
  agent: {
    agent_config: {
      prompt: "You're a helpful assistant.", // Example prompt
      voice: {
        provider: "elevenlabs", // Voice provider
        voice_id: "voice-id" // Replace 'voice-id' with the ID of the desired voice
      },
      language: "<language_code>", // optional - use language code such as en, es
      tools: [
        {
          name: "get_user_data",
          description: "",
          webhook: "https://...",
          header: {
            "Content-Type": "application/json",
            "Authorization": ""
          },
          params: [
            {
              name: "",
              type: "string" | "number" | "boolean",
              description: "",
              required: true
            }
          ]
        }
      ], // Replace with actual function calls you need
      custom_llm_websocket: "wss://...", // optional - enable custom llm
      llm: "", // optional - choose llm model. Ex: gpt-4o, llama-3-70b
    }
  }
});

To obtain the voice_id, use this API to acquire the complete list of voices: https://api-west.millis.ai:8080/voices

Overriding Agent configuration

Additional Parameters

Metadata

metadata: Optional field to pass any additional information that may personalize the conversation. It can be used by the agent if include_metadata_in_prompt is set to true.

Including Metadata in Prompt

include_metadata_in_prompt: Boolean flag (true or false). If true, the metadata provided will be included in the prompt to give context to the agent.

Session Continuation

session_continuation: Provide session_id from a previous session to enable continuity in conversation. This allows the agent to reference previous interactions.

4. Stop a Conversation:

msClient.stop();

5. Setup event listener:

    msClient.on("onopen", () => {
      // When the client connected to the server
    });

    msClient.on("onready", () => {
      // When the conversation is ready
    });

    msClient.on("onaudio", (audio: Uint8Array) => {
      // Incoming audio chunk
    });

    msClient.on("analyzer", (analyzer: AnalyserNode) => {
      // AnalyserNode that you can use for audio animation
    });

    msClient.on("onclose", (event) => {
      // When the connection is closed
    });

    msClient.on("onerror", (error) => {
      // An error occurred
    });

Event List

`onopen`

Description: Emitted when the WebSocket connection is successfully opened.

Callback Signature:

'onopen': () => void;

`onready`

Description: Emitted when the client is ready to start processing audio or other tasks.

Callback Signature:

'onready': () => void;

`onsessionended`

Description: Emitted when a session has ended.

Callback Signature:

'onsessionended': () => void;

`onaudio`

Description: Emitted when audio data is received.

Callback Signature:

'onaudio': (audio: Uint8Array) => void;

Parameters:

audio - The received audio data in Uint8Array format.

`onresponsetext`

Description: Emitted when agent’s response text is received.

Callback Signature:

'onresponsetext': (text: string, payload: { is_final?: boolean }) => void;

Parameters:

text - The received response text.
payload - An object containing additional information.
- is_final (optional) - A boolean indicating if the response text is final.

`ontranscript`

Description: Emitted when user’s transcript text is received.

Callback Signature:

'ontranscript': (text: string, payload: { is_final?: boolean }) => void;

Parameters:

text - The received transcript text.
payload - An object containing additional information.
- is_final (optional) - A boolean indicating if the transcript text is final.

`onfunction`

Description: Emitted when agent triggered a function call.

Callback Signature:

'onfunction': (text: string, payload: { name: string, params: object }) => void;

Parameters:

text - Empty.
payload - Information about the triggered function.
- name - The function name.
- params - The params being used in the function call.

`analyzer`

Description: Emitted with an AnalyserNode for agent’s audio analysis.

Callback Signature:

'analyzer': (analyzer: AnalyserNode) => void;

Parameters:

analyzer - The AnalyserNode used for audio analysis.

`useraudioready`

Description: Emitted when user audio is ready for processing.

Callback Signature:

'useraudioready': (data: { analyser: AnalyserNode, stream: MediaStream }) => void;

Parameters:

data - An object containing audio-related information.
- analyser - The AnalyserNode for user’s audio analysis.
- stream - The MediaStream containing the user’s audio data.

`onlatency`

Description: Emitted to report latency information for debugging purpose.

Callback Signature:

'onlatency': (latency: number) => void;

Parameters:

latency - The measured latency in milliseconds.

`onclose`

Description: Emitted when the WebSocket connection is closed.

Callback Signature:

'onclose': (event: CloseEvent) => void;

Parameters:

event - The CloseEvent containing details about the WebSocket closure.

`onerror`

Description: Emitted when an error occurs in the WebSocket connection.

Callback Signature:

'onerror': (error: Event) => void;

Parameters:

error - The Event containing details about the error.

Example Usage

Here’s an example of how to listen to these events in the Client class:

const client = new Client(config);

client.on('onopen', () => {
  console.log('WebSocket connection opened.');
});

client.on('onready', () => {
  console.log('Client is ready.');
});

client.on('onsessionended', () => {
  console.log('Session ended.');
});

client.on('onaudio', (audio) => {
  console.log('Audio received:', audio);
});

client.on('onresponsetext', (text, payload) => {
  console.log('Response text:', text, 'Payload:', payload);
});

client.on('ontranscript', (text, payload) => {
  console.log('Transcript:', text, 'Payload:', payload);
});

client.on('analyzer', (analyzer) => {
  console.log('Analyzer node:', analyzer);
});

client.on('useraudioready', (data) => {
  console.log('User audio ready:', data);
});

client.on('onlatency', (latency) => {
  console.log('Latency:', latency);
});

client.on('onclose', (event) => {
  console.log('WebSocket connection closed:', event);
});

client.on('onerror', (error) => {
  console.error('WebSocket error:', error);
});

Support

If you encounter any issues or have questions, please reach out to us directly at thach@millis.ai.

Millis AI

Overview

Introduction

Key Features and Capabilities

Millis AI Pricing Overview

​Price Breakdown

​Base Charge

​LLM Model Pricing

​TTS Provider Pricing

​STT Pricing

​Example Calculation

Tokenomics

Core Concepts

Agent

Functions for Agents

Function Types:

Webhooks

1. Prefetch Data Webhook

Attach metadata to call

​How to Add Metadata

​1. Via URL Parameters (Embedded Widget)

​2. Using the Web SDK

​3. Using the native Websocket connection

​4. Using the Outbound API

​How to retrieve and use Metadata

​Use Metadata as Variables

Dynamic Variables

What Are Variables?

​How to Use Variables

​Default and Dynamic Variables

​Examples of How to Use Variables

Session Continuation

​Important Note on Data Opt-Out

​How to Pass session_id for Session Continuation

​1. Web SDK Integration

​2. Native Integration via WebSocket

​3. Outbound Call API Integration

​Example Scenarios

​Future Enhancements

Running Voice Agents in Different Regions

​Selecting Regions

​List of Available Endpoints

Integration Guides

Web SDK

Installation

​Usage

​1. Import the SDK:

​2. Initialize the Client:

​3. Start a Conversation:

​4. Stop a Conversation:

​5. Setup event listener:

​Event List

​onopen

​onready

​onsessionended

​onaudio

​onresponsetext

​ontranscript

​onfunction

​analyzer

​useraudioready

​onlatency

​onclose

​onerror

​Example Usage

​Support

Custom LLM

Basic

Build Native Apps with Websocket

​Requirements

​Overview

​Step-by-Step Guide

​1. Establishing a WebSocket Connection

​2. Sending the Initiate Message

​3. Capturing and Sending Audio

​4. Receiving and Playing Audio Responses

​5. Keeping the Connection Alive

​6. Handling Incoming Events from Millis

​7. Closing the Connection

Embeddable Call Widget

Price Breakdown

Base Charge

LLM Model Pricing

TTS Provider Pricing

STT Pricing

Example Calculation

How to Add Metadata

1. Via URL Parameters (Embedded Widget)

2. Using the Web SDK

3. Using the native Websocket connection

4. Using the Outbound API

How to retrieve and use Metadata

Use Metadata as Variables

How to Use Variables

Default and Dynamic Variables

Examples of How to Use Variables

Important Note on Data Opt-Out

How to Pass `session_id` for Session Continuation

1. Web SDK Integration

2. Native Integration via WebSocket

3. Outbound Call API Integration

Example Scenarios

Future Enhancements

Selecting Regions

List of Available Endpoints

Usage

1. Import the SDK:

2. Initialize the Client:

3. Start a Conversation:

4. Stop a Conversation:

5. Setup event listener:

Event List

`onopen`

`onready`

`onsessionended`

`onaudio`

`onresponsetext`

`ontranscript`

`onfunction`

`analyzer`

`useraudioready`

`onlatency`

`onclose`

`onerror`

Example Usage

Support

Requirements

Overview

Step-by-Step Guide

1. Establishing a WebSocket Connection

2. Sending the Initiate Message

3. Capturing and Sending Audio

4. Receiving and Playing Audio Responses

5. Keeping the Connection Alive

6. Handling Incoming Events from Millis

7. Closing the Connection

Price Breakdown

Base Charge

LLM Model Pricing

TTS Provider Pricing

STT Pricing

Example Calculation

How to Add Metadata

1. Via URL Parameters (Embedded Widget)

2. Using the Web SDK

3. Using the native Websocket connection

4. Using the Outbound API

How to retrieve and use Metadata

Use Metadata as Variables

Important Note on Data Opt-Out

How to Pass `session_id` for Session Continuation

1. Web SDK Integration

2. Native Integration via WebSocket

3. Outbound Call API Integration

Example Scenarios

Future Enhancements