Build Real-Time Voice Agents with LiveKit & IPFLY – Power Voice AI with Global Web Data

14 Views

Voice agents are transforming customer support, travel, finance, and retail—delivering hands-free, conversational experiences that feel human. But to be useful, voice agents need real-time, relevant web data (e.g., flight statuses, stock prices, product availability) to avoid outdated or generic responses. LiveKit provides the infrastructure for low-latency voice communication, while IPFLY’s premium proxy solutions (90M+ global IPs across 190+ countries, static/dynamic residential, and data center proxies) solve the critical bottleneck: unrestricted access to live web data.

Build Real-Time Voice Agents with LiveKit & IPFLY – Power Voice AI with Global Web Data

This guide walks you through building a real-time voice agent with LiveKit (for voice streaming), OpenAI Whisper (for voice-to-text), TTS (for text-to-voice), and IPFLY (for web data retrieval). You’ll learn how to integrate IPFLY to scrape live data, bypass geo-restrictions, and ensure your voice agent delivers accurate, context-rich responses—whether users ask about local weather, global stock trends, or product availability.

Introduction to Voice Agents, LiveKit & IPFLY’s Role

Voice agents (or voice bots) use natural language processing (NLP) and voice recognition to interact with users via speech. Unlike chatbots, they operate in real time—requiring instant access to live data to answer questions like:

“What’s the current price of Bitcoin?”

“Is my flight to Paris delayed?”

“Do you have the new wireless headphones in stock?”

LiveKit is the backbone of real-time voice agents: it provides scalable, low-latency WebRTC-based voice streaming, room management, and audio processing—critical for smooth, lag-free conversations. But LiveKit alone can’t access web data—this is where IPFLY comes in.

IPFLY’s proxy infrastructure is designed for real-time voice AI needs:

Dynamic Residential Proxies: Rotate per request to mimic real users, avoiding blocks on data-rich sites (e.g., airline portals, e-commerce platforms).

Static Residential Proxies: Deliver consistent access to trusted sources (e.g., financial APIs, government weather sites).

Data Center Proxies: Offer high-speed, low-latency data retrieval for time-sensitive queries (e.g., stock prices, sports scores).

190+ country coverage: Unlock region-specific data (e.g., local store hours, regional flight info) for global voice agents.

99.9% uptime: Ensure your voice agent never fails to retrieve live data during critical conversations.

Together, LiveKit and IPFLY create a powerful stack: LiveKit handles the voice interaction, while IPFLY fuels it with the real-time web data that makes responses useful.

What Are Voice Agents & Why LiveKit + IPFLY Is a Game-Changer

What Are Voice Agents?

Voice agents are AI-powered tools that process spoken language, retrieve relevant data, and respond with natural speech. They’re used across industries:

Customer Support: Answer FAQs, track orders, or troubleshoot issues without human agents.

Travel: Check flight statuses, book hotels, or find local attractions.

Finance: Provide real-time stock prices, account balances, or investment insights.

Retail: Check product availability, compare prices, or process voice orders.

The key differentiator for great voice agents is real-time data access. A voice agent that can’t pull live flight info or current stock prices feels outdated—users will abandon it for human support.

Why LiveKit?

LiveKit is an open-source, scalable WebRTC framework built for real-time audio/video applications. It’s ideal for voice agents because:

Low Latency: Ensures voice streams are processed in milliseconds—critical for natural conversations.

Scalability: Supports thousands of concurrent voice sessions (perfect for enterprise deployments).

Flexibility: Integrates with NLP tools (Whisper, ChatGPT), TTS engines, and custom data sources.

Reliability: Built for production with built-in fault tolerance and global edge nodes.

Why IPFLY?

LiveKit enables voice streaming, but voice agents need live web data to function. IPFLY solves the biggest data access challenges:

Geo-Restrictions: Access region-specific data (e.g., UK train schedules, Japanese retail prices) with IPFLY’s 190+ country IP pool.

Anti-Scraping Blocks: Dynamic residential proxies mimic real users to avoid detection on sites like Amazon, Delta, or CoinGecko.

Speed: Data center proxies deliver low-latency data retrieval for time-sensitive queries (e.g., sports scores, crypto prices).

Consistency: 99.9% uptime and multi-layer IP filtering ensure your agent never fails to retrieve data mid-conversation.

Without IPFLY, your voice agent would be limited to static data—rendering it useless for real-time use cases.

Prerequisites

Before building your voice agent, ensure you have:

Python 3.10+ (for backend logic).

A LiveKit account (free tier available; sign up here).

LiveKit Server SDK (for room management) and Client SDK (for voice streaming).

OpenAI API key (for Whisper API and TTS; get one here).

An IPFLY account (with API key, proxy endpoint, and access to dynamic residential proxies).

Basic familiarity with WebRTC, Python, and REST APIs.

Install required dependencies:

pip install livekit-server-sdk livekit-client openai requests python-dotenv

Step-by-Step Guide: Build a Real-Time Voice Agent with LiveKit & IPFLY

We’ll build a travel-focused voice agent that:

1.Streams voice via LiveKit (user speaks a query like “Is my Delta flight DL123 delayed?”).

2.Converts voice to text using OpenAI Whisper.

3.Uses IPFLY proxies to scrape live flight data from Delta’s website.

4.Generates a natural response with OpenAI TTS.

5.Streams the response back to the user via LiveKit.

Step 1: Configure LiveKit Project

1.Log into your LiveKit account and create a new project (e.g., “TravelVoiceAgent”).

2.Navigate to “Project Settings” → “API Keys” and generate a Server API Key and Secret (store these securely—they’ll authenticate your backend with LiveKit).

3.Note your LiveKit server URL (e.g., wss://project-xyz.livekit.cloud).

Step 2: Set Up IPFLY Proxies for Live Data Retrieval

IPFLY will power the agent’s flight data scraping. Here’s how to configure it:

1.Log into your IPFLY account and retrieve:

Proxy endpoint (e.g., http://[USERNAME]:[PASSWORD]@proxy.ipfly.com:8080).
API key (for proxy management).

2.For travel data (e.g., Delta flight status), use dynamic residential proxies—they mimic real users to avoid blocks on airline websites.

Create a .env file to store credentials securely:

LIVEKIT_API_KEY="<YOUR_LIVEKIT_API_KEY>"
LIVEKIT_API_SECRET="<YOUR_LIVEKIT_API_SECRET>"
LIVEKIT_SERVER_URL="wss://<YOUR_LIVEKIT_PROJECT>.livekit.cloud"
OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
IPFLY_PROXY_ENDPOINT="http://[USERNAME]:[PASSWORD]@proxy.ipfly.com:8080"
IPFLY_API_KEY="<YOUR_IPFLY_API_KEY>"

Step 3: Build the Backend (LiveKit + Whisper + IPFLY)

Create a voice_agent_backend.py file to handle voice streaming, voice-to-text, data retrieval, and text-to-speech.

Step 3.1: Initialize Dependencies & Load Environment Variables

import os
import json
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from livekit import RoomServiceClient, AccessToken
from livekit.rtc import RoomEvent, ParticipantEvent
from openai import OpenAI

# Load environment variables
load_dotenv()# Initialize clients
livekit_client = RoomServiceClient(
    url=os.getenv("LIVEKIT_SERVER_URL"),
    api_key=os.getenv("LIVEKIT_API_KEY"),
    api_secret=os.getenv("LIVEKIT_API_SECRET"))
openai_client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
IPFLY_PROXY = {"http": os.getenv("IPFLY_PROXY_ENDPOINT"),"https": os.getenv("IPFLY_PROXY_ENDPOINT")}

Step 3.2: Create IPFLY Data Retrieval Tool (Flight Status Scraper)

Add a function to scrape live flight status using IPFLY proxies:

defget_flight_status(airline: str, flight_number: str) -> str:"""Scrape live flight status using IPFLY proxies (Delta example)."""# Delta flight status URL (customize for other airlines)
    url = f"https://www.delta.com/en-us/flights/status?flightNumber={flight_number}&date=today"
    
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}try:# Send request with IPFLY proxy to avoid blocks
        response = requests.get(
            url,
            proxies=IPFLY_PROXY,
            headers=headers,
            timeout=15  # Low timeout for real-time voice agent)
        response.raise_for_status()# Parse flight status (customize selector for target airline)
        soup = BeautifulSoup(response.text, "html.parser")
        status_element = soup.find("div", class_="flight-status-value")
        departure_time = soup.find("div", class_="departure-time").get_text(strip=True) if soup.find("div", class_="departure-time") else"N/A"
        arrival_time = soup.find("div", class_="arrival-time").get_text(strip=True) if soup.find("div", class_="arrival-time") else"N/A"if status_element:
            status = status_element.get_text(strip=True)returnf"Flight {airline} {flight_number} status: {status}. Departure: {departure_time}. Arrival: {arrival_time}."else:returnf"Could not retrieve status for flight {airline} {flight_number}."except Exception as e:returnf"Error fetching flight status: {str(e)}"

Step 3.3: Add Voice-to-Text (Whisper) & Text-to-Speech (TTS)

Add functions to convert user speech to text and agent responses to speech:

defvoice_to_text(audio_data: bytes) -> str:"""Convert audio from LiveKit to text using OpenAI Whisper."""withopen("temp_audio.wav", "wb") as f:
        f.write(audio_data)
    
    response = openai_client.audio.transcriptions.create(
        model="whisper-1",file=open("temp_audio.wav", "rb"),
        language="en")
    os.remove("temp_audio.wav")return response.text

deftext_to_speech(text: str) -> bytes:"""Convert agent response text to speech using OpenAI TTS."""
    response = openai_client.audio.speech.create(
        model="tts-1",
        voice="alloy",input=text
    )return response.content

Step 3.4: Define LiveKit Room Logic

Add logic to handle LiveKit rooms, audio streaming, and agent responses:

asyncdefhandle_room(room):"""Handle LiveKit room events (user join, audio stream, etc.)."""print(f"Room {room.name} created. Waiting for users...")@room.on(ParticipantEvent.TRACK_PUBLISHED)asyncdefon_track_published(participant, track):"""Process audio track from user."""print(f"User {participant.identity} published audio track.")# Subscribe to the audio trackawait track.subscribe()# Collect audio data (stream in chunks for real-time processing)
        audio_chunks = []@track.on("data")defon_audio_data(data):
            audio_chunks.append(data)# Process after ~3 seconds of audio (adjust for longer/shorter queries)iflen(audio_chunks) > 30:  # ~3s of 100ms chunks
                process_audio(participant, b"".join(audio_chunks))
                audio_chunks.clear()asyncdefprocess_audio(participant, audio_data):"""Convert audio to text, retrieve data, generate response."""try:# Step 1: Voice to text
            user_query = voice_to_text(audio_data)print(f"User query: {user_query}")# Step 2: Extract intent (simplified NLP for travel agent)if"flight"in user_query.lower() and ("status"in user_query.lower() or"delayed"in user_query.lower()):# Extract flight details (simplified—use NLP library for production)
                airline = "Delta"  # Customize with NLP extraction (e.g., "United flight 456" → airline=United)
                flight_number = user_query.split()[-1]  # Assume last word is flight number# Step 3: Retrieve live flight status with IPFLY
                agent_response = get_flight_status(airline, flight_number)else:
                agent_response = "I can help with flight status queries. Please ask: 'What's the status of Delta flight 123?'"# Step 4: Text to speech
            audio_response = text_to_speech(agent_response)# Step 5: Stream response back to user via LiveKitawait publish_audio(room, participant, audio_response)except Exception as e:
            error_response = "Sorry, I couldn't process your request. Please try again."
            audio_error = text_to_speech(error_response)await publish_audio(room, participant, audio_error)print(f"Error processing audio: {str(e)}")asyncdefpublish_audio(room, participant, audio_data):"""Publish agent's audio response to the user."""# Create audio track (LiveKit expects PCM 16kHz, 16-bit, mono)
        audio_track = await room.create_local_audio_track(
            name="agent-response",
            source=audio_data
        )await room.local_participant.publish_track(audio_track)# Send track to specific user (or broadcast to all)await room.local_participant.send_data(
            data=audio_data,
            destination_identities=[participant.identity])# Create a LiveKit room and start handling eventsasyncdefstart_voice_agent():
    room = await livekit_client.create_room(name="travel-voice-agent-room")await handle_room(room)# Keep room alive (run in production with a server like Uvicorn)whileTrue:await asyncio.sleep(1)if __name__ == "__main__":import asyncio
    asyncio.run(start_voice_agent())

Step 4: Build the Frontend (LiveKit Client)

Create a simple HTML/JavaScript frontend (index.html) to let users join the voice room and speak to the agent:

<!DOCTYPE html><html><head><title>Travel Voice Agent</title><script src="https://cdn.jsdelivr.net/npm/livekit-client@2.0.0/dist/livekit-client.umd.min.js"></script><style>body { font-family: Arial; max-width: 800px; margin: 2rem auto; padding: 0 1rem; }button { padding: 1rem 2rem; font-size: 1.1rem; cursor: pointer; background: #007bff; color: white; border: none; border-radius: 8px; }button:disabled { background: #6c757d; cursor: not-allowed; }#status { margin: 1rem 0; padding: 1rem; border-radius: 8px; background: #f8f9fa; }</style></head><body><h1>Travel Voice Agent</h1><p>Ask about flight status (e.g., "What's the status of Delta flight 123?")</p><button id="joinBtn">Join Voice Room</button><button id="leaveBtn"disabled>Leave Room</button><div id="status">Status: Disconnected</div><script>const LIVEKIT_SERVER_URL = "wss://<YOUR_LIVEKIT_PROJECT>.livekit.cloud";const ROOM_NAME = "travel-voice-agent-room";const USER_IDENTITY = `user-${Math.random().toString(36).substr(2, 9)}`;let room;// Generate LiveKit tokenasyncfunctiongetToken() {// In production, generate token on backend (never expose API secret client-side)const response = awaitfetch("/generate-token", { method: "POST" });return response.text();}// Join roomdocument.getElementById("joinBtn").addEventListener("click", async () => {const token = awaitgetToken();
            room = new LiveKit.Room({ audioCaptureDefaults: { echoCancellation: true } });// Update statusdocument.getElementById("status").textContent = "Joining room...";document.getElementById("joinBtn").disabled = true;// Join roomawait room.connect(LIVEKIT_SERVER_URL, token);document.getElementById("status").textContent = "Connected! Speak to the agent.";document.getElementById("leaveBtn").disabled = false;// Publish microphone audioconst audioTrack = await LiveKit.createLocalAudioTrack();await room.localParticipant.publishTrack(audioTrack);// Listen for agent's audio response
            room.on(LiveKit.RoomEvent.TRACK_SUBSCRIBED, (track) => {if (track.kind === "audio") {const audioElement = document.createElement("audio");
                    audioElement.autoplay = true;
                    audioElement.srcObject = track.attach();document.body.appendChild(audioElement);}});});// Leave roomdocument.getElementById("leaveBtn").addEventListener("click", async () => {await room.disconnect();document.getElementById("status").textContent = "Disconnected";document.getElementById("joinBtn").disabled = false;document.getElementById("leaveBtn").disabled = true;});</script></body></html>

Step 5: Add Token Generation (Backend)

To secure your LiveKit room, add a token generation endpoint (use FastAPI or Flask for production). Here’s a simple FastAPI example (token_server.py):

from fastapi import FastAPI
from livekit import AccessToken
import os
from dotenv import load_dotenv

load_dotenv()
app = FastAPI()@app.post("/generate-token")defgenerate_token():
    token = AccessToken(
        api_key=os.getenv("LIVEKIT_API_KEY"),
        api_secret=os.getenv("LIVEKIT_API_SECRET"),
        identity="user-123",  # Replace with dynamic user identity
        room_name="travel-voice-agent-room")
    token.add_grant("join", room="travel-voice-agent-room")return token.to_jwt()# Run with: uvicorn token_server:app --reload

Step 6: Test the Voice Agent

1.Start the token server:

uvicorn token_server:app --reload

2.Start the voice agent backend:

python voice_agent_backend.py

3.Open index.html in a browser, click “Join Voice Room,” and ask: “What’s the status of Delta flight 123?”

The agent will:

Convert your speech to text with Whisper.

Use IPFLY proxies to scrape live flight status from Delta’s website.

Generate a natural audio response with TTS.

Stream the response back to you via LiveKit.

Key IPFLY Benefits for Voice Agents

IPFLY’s proxies are critical to the voice agent’s success—here’s how they enhance performance:

1.Real-Time Data Retrieval: Data center proxies deliver low-latency responses (critical for voice conversations, where delays >1s feel unnatural).

2.Anti-Block Bypass: Dynamic residential proxies mimic real users to avoid blocks on airline, e-commerce, or financial sites.

3.Global Coverage: Access region-specific data (e.g., European flight statuses, Asian stock prices) with IPFLY’s 190+ country IP pool.

4.Consistency: 99.9% uptime ensures your agent never fails to retrieve data mid-conversation.

5.Protocol Support: Works with HTTP/HTTPS/SOCKS5—seamless integration with LiveKit and scraping tools.

Use Cases for LiveKit + IPFLY Voice Agents

1.Travel Voice Assistants

Scrape live flight statuses, hotel availability, and local attraction hours.

Use IPFLY’s regional IPs to access country-specific travel data (e.g., train schedules in Germany, bullet train info in Japan).

2.Financial Voice Bots

Retrieve real-time stock prices, crypto values, and market trends.

Use static residential proxies for consistent access to financial sites like Bloomberg or Yahoo Finance.

3.Retail Voice Agents

Check product availability, prices, and store hours.

Scrape competitor pricing to offer price matches (IPFLY’s dynamic proxies avoid blocks on retail sites).

4.Customer Support Voice Bots

Pull live order tracking data from e-commerce platforms.

Access regional support policies (e.g., refund rules in Canada vs. the UK) with IPFLY’s global IPs.

Optimization Tips for Production

1.Choose the Right IPFLY Proxy Type:

Time-sensitive queries (crypto prices, flight status): Use data center proxies for speed.

Strict sites (airlines, banks): Use dynamic residential proxies for anonymity.
Recurring queries (store hours): Use static residential proxies for consistency.

2.Add NLP for Intent Recognition:

Replace the simplified intent extraction with tools like spaCy or OpenAI GPT to handle complex queries (e.g., “Is my 3 PM United flight to Chicago delayed?”).

3.Cache Frequent Queries:

Reduce proxy usage and latency by caching frequently requested data (e.g., popular flight routes) for 5–10 minutes.

4.Scale with LiveKit Cloud:

For enterprise deployments, use LiveKit Cloud to handle thousands of concurrent voice sessions—IPFLY’s unlimited concurrency scales with you.

5.Monitor Proxy Performance:

Use IPFLY’s dashboard to track success rates, latency, and IP usage—optimize proxy types based on performance data.

Building real-time voice agents that deliver value requires two critical components: low-latency voice streaming (LiveKit) and unrestricted access to live web data (IPFLY). With LiveKit handling the voice infrastructure and IPFLY solving data access challenges, you can create voice agents that feel natural, helpful, and reliable.

IPFLY’s 90M+ global IPs, anti-block technology, and 99.9% uptime ensure your voice agent always has the data it needs—whether users ask about flight statuses, stock prices, or product availability. Pair this with LiveKit’s scalability and Whisper’s accurate voice-to-text, and you have an enterprise-grade voice AI solution that stands out from generic, static voice bots.

Ready to build your own voice agent? Start with IPFLY’s free trial, LiveKit’s free tier, and the code from this guide—unlock the power of real-time web data for voice AI.

END