We’re excited to announce the release of Realtime API support in the OpenAI library for JavaScript (v4.81.0), enabling developers to send and receive messages instantly from Azure OpenAI models. In this blog post, we explore how to configure, connect, and utilize this new capability to create highly interactive and responsive applications.
Why Realtime API support matters
Realtime APIs allow you to receive immediate responses from Azure OpenAI models, making them especially valuable for applications where quick feedback is essential. Whether you’re building a speech-to-speech experience, a streaming data processor, or a live monitoring tool, this feature empowers you to deliver an engaging user experience with minimal delay.
Get started
JavaScript has numerous runtimes including Node.js, browsers, and more, each with its own requirements. To cater to these various environments, the JavaScript library provides two clients for Realtime connections:
- OpenAIRealtimeWebSocket Uses the native WebSocket web API, commonly supported in browsers and other environments adhering to web standards.
- OpenAIRealtimeWS
Utilizes the
ws
library, well-suited for Node.js and similar server-side JavaScript environments.
Before you begin, make sure you have:
- Node.js installed (if you plan to work in a Node.js runtime)
-
An Azure subscription with access to the Azure OpenAI service
Installation
Use the following command to install the required packages:
npm install openai @azure/identity dotenv
Set up the environment
Create an .env file in the root of your project and add your Azure secrets:
AZURE_OPENAI_ENDPOINT="<The endpoint of the Azure OpenAI resource>"
Code sample
This section provides a step-by-step walkthrough of how to use the Realtime API in the JavaScript library. We break it down so you can easily replicate it in your own environment.
Import modules
Begin by importing the relevant modules:
import { OpenAIRealtimeWS } from 'openai/beta/realtime/websocket';
import { AzureOpenAI } from 'openai';
import { DefaultAzureCredential, getBearerTokenProvider } from '@azure/identity';
import 'dotenv/config';
Configure credentials
You need proper credentials to authenticate with the Azure OpenAI service. We use DefaultAzureCredential
, which streamlines the process by automatically selecting the appropriate credential type based on your environment:
const cred = new DefaultAzureCredential();
const scope = 'https://cognitiveservices.azure.com/.default';
const azureADTokenProvider = getBearerTokenProvider(cred, scope);
Create the client
Next, initialize the Azure OpenAI client with your desired deployment name and API version:
const deploymentName = 'gpt-4o-realtime-preview-1001';
const client = new AzureOpenAI({
azureADTokenProvider,
apiVersion: '2024-10-01-preview',
deployment: deploymentName,
});
Establish the WebSocket connection
Use the client to create a WebSocket connection. In a browser environment, you would typically use OpenAIRealtimeWebSocket.azure()
. For a Node.js environment with the ws
library, you can use OpenAIRealtimeWS.azure()
. Here’s the Node.js example:
const rt = await OpenAIRealtimeWebSocket.azure(client);
Handle events
Event handlers allow you to orchestrate how your application responds to various stages of the real-time interaction life cycle, including connection establishment, message exchange, and error handling. A detailed explanation of how to implement and manage these events follows next.
1. Listen for the open
event
When the WebSocket connection is successfully established by the server, the open
event is triggered. At this point, you can begin sending messages and commands to the Azure OpenAI model immediately. In this example, we’re updating the session parameters and initiating a text conversation with the model.
rt.socket.on('open', () => {
console.log('Connection opened!');
rt.send({
type: 'session.update',
session: {
modalities: ['text'],
model: 'gpt-4o-realtime-preview',
},
});
rt.send({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [{ type: 'input_text', text: 'Say a couple paragraphs!' }],
},
});
// Signal that we're ready to receive a response from the model
rt.send({ type: 'response.create' });
});
In this snippet:
session.update
informs the service about any configuration changes (for example, chosen model, input modalities).conversation.item.create
sends a user prompt to the model.response.create
indicates you want the model to begin generating a response immediately.
2. Subscribe to session and response events
After initializing the session and sending conversation items, you’ll want to capture the model’s responses. The JavaScript library provides event listeners for these activities:
rt.on('session.created', (event) => {
console.log('session created!', event.session);
console.log();
});
rt.on('response.text.delta', (event) => process.stdout.write(event.delta));
rt.on('response.text.done', () => console.log());
rt.on('response.done', () => rt.close());
rt.socket.on('close', () => console.log('\nConnection closed!'));
session.created
indicates that the session is successfully set up on the server.response.text.delta
streams partial text output as it is generated, allowing you to handle or display responses in real-time.response.text.done
fires when the text generation process for that particular response completes.response.done
signals that the entire response cycle is finished. Here, we close the WebSocket connection as a simple example, though you may choose to keep it open for further interactions.close
is an event on the underlying WebSocket (rt.socket.on('close')
), telling you that the connection is deliberately terminated or unexpectedly closed.
3. Handle errors
In any network or service interaction, errors may occur. Ensuring that your application logs and handles these errors is crucial for stability and a smooth user experience:
rt.on('error', (err) => {
// Log the error or handle it based on your application needs
console.error('An error occurred:', err);
});
Conclusion
The introduction of Realtime API support in the OpenAI library for JavaScript provides developers with a powerful new way to create interactive, low-latency applications. With these capabilities, you can deliver enriched user experiences—be it live chatbots, streaming analytics, or real-time data processing tools. We hope this detailed guide helps you get started with building and experimenting in your own environment.
Stay tuned for future updates and enhancements to the library, and feel free to share your innovative uses of the Realtime API in the comments!
Next steps
To further expand your Realtime integration with Azure OpenAI, explore the following resources for more guidance and practical examples:
- Samples Get hands-on experience by reviewing sample projects in the official OpenAI Node.js repository: https://github.com/openai/openai-node/tree/master/examples/azure/realtime
- Use with Audio Learn how to incorporate Realtime audio capabilities with Azure OpenAI through streaming input and output audio data: https://learn.microsoft.com/azure/ai-services/openai/how-to/realtime-audio
- API Reference Consult the official documentation for detailed information about all available endpoints, parameters, and data structures: https://learn.microsoft.com/azure/ai-services/openai/realtime-audio-reference
0 comments
Be the first to start the discussion.