Integrate Langfuse into an IPFLY-Powered AI Agent for Enterprise Observability

12 Views

Langfuse is an open-source LLM engineering platform that delivers observability, tracing, and monitoring for AI agents—critical for enterprise use cases like compliance tracking, where reliability and transparency are non-negotiable. When building a compliance-focused AI agent with LangChain, the biggest challenge is unrestricted access to authoritative web data (e.g., regulatory updates, government guidelines).

IPFLY’s premium proxy solutions (90M+ global IPs across 190+ countries, static/dynamic residential, and data center proxies) solve this: multi-layer IP filtering bypasses anti-scraping measures, global coverage unlocks region-specific compliance data, and 99.9% uptime ensures consistent data ingestion. This guide walks you through building a LangChain compliance AI agent, integrating IPFLY for web data collection, and using Langfuse to trace every step—from prompt inputs to proxy-powered web scraping.

Integrate Langfuse into an IPFLY-Powered AI Agent for Enterprise Observability

Introduction to Langfuse, AI Agent Observability & IPFLY’s Role

Enterprise AI agents (especially compliance-tracking ones) rely on two pillars: accurate web data (to stay updated on regulations like GDPR, CCPA) and full observability (to validate decisions, track costs, and ensure compliance).

Langfuse: Provides end-to-end tracing, metrics, and debugging for LLM applications—letting teams monitor every step of an AI agent’s workflow (prompts, tool calls, responses).

LangChain: Orchestrates AI agent logic, connecting LLMs to external tools (like web scrapers) for data retrieval.

IPFLY: Eliminates web data access bottlenecks with proxies designed for AI: dynamic residential proxies mimic real users to avoid blocks, static residential proxies ensure consistent access to trusted regulatory sites, and data center proxies handle large-scale scraping—all with global coverage for regional compliance data.

Together, these tools create an enterprise-ready stack: IPFLY fuels the agent with high-quality web data, LangChain manages the workflow, and Langfuse ensures full visibility into performance and reliability.

What Is Langfuse?

Langfuse is an open-source, cloud-native platform for LLM application development and monitoring. It empowers teams to:

Trace workflows: Track every step of AI agent runs (prompt inputs, tool calls, LLM outputs, latency, and costs).

Manage prompts: Version-control prompts collaboratively without editing code.

Evaluate performance: Collect human feedback, run automated tests, and score agent accuracy.

Collaborate: Annotate traces, add comments, and share insights across teams.

Deploy flexibly: Use the hosted cloud service (free tier available) or self-host for full data control.

For compliance AI agents, Langfuse’s tracing is invaluable—it creates an audit trail of how the agent sourced web data (via IPFLY) and arrived at regulatory insights, simplifying compliance with internal governance and external regulations.

Why Integrate Langfuse Into Your AI Agent

AI agents for compliance interact with sensitive documents, external web data, and complex regulatory rules—blind spots here can lead to costly mistakes or non-compliance. Langfuse solves this by:

Providing end-to-end tracing: Monitor every tool call (e.g., IPFLY web scrapes) and data source to validate insights.

Tracking key metrics: Measure latency, LLM costs, and web scraping success rates (critical for optimizing IPFLY proxy usage).

Enabling fast debugging: Identify failed scrapes, outdated prompts, or LLM hallucinations with detailed logs.

Supporting compliance: Create immutable records of agent behavior for audits.

When paired with IPFLY, Langfuse ensures not just that your agent works—but that you can prove it works reliably and lawfully.

How to Use Langfuse to Trace a Compliance-Tracking AI Agent (LangChain + IPFLY)

We’ll build an enterprise-grade compliance AI agent that:

1.Loads internal PDF documents (e.g., data processing workflows).

2.Analyzes the PDF to identify privacy/regulatory risks.

3.Uses IPFLY proxies to search for updated regulations (SERP data) and scrape authoritative sources (government sites).

4.Generates a compliance report with citations from internal docs and web data.

5.Integrates Langfuse for full workflow tracing.

Prerequisites

Before starting, ensure you have:

Python 3.10 or higher.

An OpenAI API key (or other LLM provider API key).

An IPFLY account (with API key and access to dynamic residential proxies).

A Langfuse account (public/secret API keys configured).

Basic familiarity with LangChain and Python.

Step #1: Set Up Your LangChain AI Agent Project

Create a project folder and virtual environment:

mkdir compliance-ai-agent-ipfly-langfuse
cd compliance-ai-agent-ipfly-langfuse
python -m venv .venv
# Activate: macOS/Linux → source .venv/bin/activate; Windows → .venv\Scripts\activate
pip install langchain langchain-openai langgraph langchain-community pypdf python-dotenv langfuse requests

Create two files: agent.py (core logic) and .env (credentials):

compliance-ai-agent-ipfly-langfuse/
├── .venv/
├── agent.py
└── .env

Step #2: Configure Environment Variable Reading

In agent.py, load environment variables to store sensitive credentials securely:

from dotenv import load_dotenv
load_dotenv()  # Loads variables from .env file

Add credentials to your .env file (we’ll populate IPFLY/Langfuse/OpenAI keys in later steps):

OPENAI_API_KEY="<YOUR_OPENAI_API_KEY>"
IPFLY_API_KEY="<YOUR_IPFLY_API_KEY>"
IPFLY_PROXY_ENDPOINT="http://[USERNAME]:[PASSWORD]@proxy.ipfly.com:8080"
LANGFUSE_SECRET_KEY="<YOUR_LANGFUSE_SECRET_KEY>"
LANGFUSE_PUBLIC_KEY="<YOUR_LANGFUSE_PUBLIC_KEY>"
LANGFUSE_BASE_URL="<YOUR_LANGFUSE_BASE_URL>"

Step #3: Prepare Your IPFLY Account

IPFLY powers the agent’s web data collection (SERP searches + regulatory site scraping). Here’s how to configure it:

1.Log into your IPFLY account and generate an API key (under “Account Settings”).

2.Note your proxy endpoint (provided in IPFLY’s dashboard) – it includes your username, password, and port.

3.For compliance use cases, select dynamic residential proxies (to avoid blocks on government/regulatory sites) or static residential proxies (for consistent access to trusted sources).

IPFLY’s key benefits for this agent:

90M+ real-user IPs: Mimic human browsing to bypass anti-scraping tools (e.g., CAPTCHAs on GDPR.eu).

190+ country coverage: Scrape region-specific regulations (e.g., CCPA for California, GDPR for EU).

Multi-layer IP filtering: Ensures no blacklisted IPs are used, maintaining compliance with data collection rules.

99.9% uptime: Guarantees consistent access to critical regulatory data.

Step #4: Build IPFLY Tools for LangChain

Create custom LangChain tools to handle SERP searches and web scraping using IPFLY proxies. Add these to agent.py:

import requests
from bs4 import BeautifulSoup
from langchain.tools import Tool

class IPFLYSERPTool(Tool):"""Tool to retrieve SERP data using IPFLY proxies for regulatory search queries."""def__init__(self):super().__init__(
            name="ipfly_serp_search",
            description="Searches Google for regulatory keywords (e.g., 'GDPR data retention') using IPFLY proxies. Returns top 5 search results (prioritizes government sites).",
            func=self.run
        )
        self.proxy = os.getenv("IPFLY_PROXY_ENDPOINT")defrun(self, query: str) -> str:"""Run SERP search with IPFLY proxy."""
        params = {"q": query,"hl": "en","gl": "us"  # Customize for regional regulations (e.g., "eu" for GDPR)}
        headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}try:
            response = requests.get("https://www.google.com/search",
                params=params,
                proxies={"http": self.proxy, "https": self.proxy},
                headers=headers,
                timeout=30)
            response.raise_for_status()
            soup = BeautifulSoup(response.text, "html.parser")
            results = []# Extract top 5 organic results (prioritize gov sites)for g in soup.find_all("div", class_="g")[:5]:
                title = g.find("h3").get_text(strip=True) if g.find("h3") elseNone
                url = g.find("a")["href"] if g.find("a") elseNoneif title and url and ("gov"in url or"regulatory"in url):
                    results.append({"title": title, "url": url})return json.dumps(results, indent=2)except Exception as e:returnf"SERP search failed: {str(e)}"class IPFLYWebScraperTool(Tool):"""Tool to scrape regulatory sites using IPFLY proxies (returns Markdown-formatted content)."""def__init__(self):super().__init__(
            name="ipfly_web_scraper",
            description="Scrapes content from regulatory websites (e.g., government sites) using IPFLY proxies. Returns clean, Markdown-formatted text for LLM analysis.",
            func=self.run
        )
        self.proxy = os.getenv("IPFLY_PROXY_ENDPOINT")defrun(self, url: str) -> str:"""Scrape web page with IPFLY proxy."""
        headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"}try:
            response = requests.get(
                url,
                proxies={"http": self.proxy, "https": self.proxy},
                headers=headers,
                timeout=30)
            response.raise_for_status()
            soup = BeautifulSoup(response.text, "html.parser")# Extract main content (remove ads/navigation)for script in soup(["script", "style", "nav", "aside", "footer"]):
                script.decompose()
            text = soup.get_text(strip=True, separator="\n")# Convert to Markdown (simplified)
            lines = [line.strip() for line in text.split("\n") if line.strip()]
            markdown = "\n\n".join(lines[:50])  # Limit to 50 lines for LLM contextreturnf"Source: {url}\n\n{markdown}"except Exception as e:returnf"Web scraping failed: {str(e)}"# Initialize IPFLY tools
ipfly_serp_tool = IPFLYSERPTool()
ipfly_scraper_tool = IPFLYWebScraperTool()

Step #5: Integrate the LLM

Add OpenAI (or your preferred LLM) to agent.py to power the agent’s analysis:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-5-mini",  # Replace with your LLM (e.g., gpt-4o)
    api_key=os.getenv("OPENAI_API_KEY"))

Step #6: Define the Compliance AI Agent

Combine the LLM, IPFLY tools, and a system prompt to create the agent. Add this to agent.py:

from langchain.agents import create_agent
from langchain_core.prompts import PromptTemplate

# System prompt for compliance tracking
system_prompt = """
You are a compliance-tracking expert. Your role is to analyze internal documents for privacy/regulatory risks and validate findings with updated web data (via IPFLY proxies).
Follow these rules:
1. Analyze the input PDF to identify key regulatory aspects (e.g., data retention, deletion).
2. Generate 2-3 concise SERP queries (max 5 words) to find updated regulations.
3. Use ipfly_serp_search to get top regulatory sites (prioritize government sources).
4. Use ipfly_web_scraper to extract content from those sites.
5. Create a report with:
   - Quotes from the internal PDF.
   - Insights from scraped web data.
   - Clear compliance recommendations.
6. Only use data from IPFLY-scraped sources and the input PDF—never make up information.
"""# List of tools (IPFLY + LLM)
tools = [ipfly_serp_tool, ipfly_scraper_tool]# Create the agent (LangGraph-powered)
agent = create_agent(
    llm=llm,
    tools=tools,
    system_prompt=system_prompt
)

Step #7: Launch the Agent (Load PDF & Create Prompt)

Add logic to load internal PDF documents and generate a prompt for the agent. Add this to agent.py:

from langchain_community.document_loaders import PyPDFDirectoryLoader

# Create input folder for PDFs
os.makedirs("./input", exist_ok=True)# Load PDF documents
loader = PyPDFDirectoryLoader("./input")
docs = loader.load()
internal_doc = "\n\n".join([doc.page_content for doc in docs])# Prompt template for the agent
prompt_template = PromptTemplate.from_template("""
Analyze the following internal document for compliance risks and validate with updated web data:

PDF CONTENT:
{pdf}

Generate a concise compliance report with quotes from the PDF and scraped regulatory insights.
""")# Create final prompt
prompt = prompt_template.format(pdf=internal_doc)

Step #8: Set Up Langfuse for Observability

1.Create a Langfuse account (free tier available) and navigate to “Project Settings” → “API Keys.”

2.Generate public/secret keys and add them to your .env file (as shown in Step #2).

Step #9: Integrate Langfuse Tracking

Add Langfuse tracing to the agent to monitor every step (tool calls, LLM outputs, latency). Update agent.py:

from langfuse import get_client
from langfuse.langchain import CallbackHandler

# Initialize Langfuse client
langfuse = get_client(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    base_url=os.getenv("LANGFUSE_BASE_URL"))# Create Langfuse callback handler
langfuse_handler = CallbackHandler()

Step #10: Final Code

Your complete agent.py file will look like this:

import os
import json
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from langchain.tools import Tool
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langchain_core.prompts import PromptTemplate
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langfuse import get_client
from langfuse.langchain import CallbackHandler

# Load environment variables
load_dotenv()# ------------------------------# Langfuse Setup# ------------------------------
langfuse = get_client(
    secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    base_url=os.getenv("LANGFUSE_BASE_URL"))
langfuse_handler = CallbackHandler()# ------------------------------# IPFLY Tools for LangChain# ------------------------------class IPFLYSERPTool(Tool):def__init__(self):super().__init__(
            name="ipfly_serp_search",
            description="Searches Google for regulatory keywords using IPFLY proxies. Returns top 5 government/regulatory sites.",
            func=self.run
        )
        self.proxy = os.getenv("IPFLY_PROXY_ENDPOINT")defrun(self, query: str) -> str:
        params = {"q": query, "hl": "en", "gl": "us"}
        headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}try:
            response = requests.get("https://www.google.com/search",
                params=params,
                proxies={"http": self.proxy, "https": self.proxy},
                headers=headers,
                timeout=30)
            response.raise_for_status()
            soup = BeautifulSoup(response.text, "html.parser")
            results = []for g in soup.find_all("div", class_="g")[:5]:
                title = g.find("h3").get_text(strip=True) if g.find("h3") elseNone
                url = g.find("a")["href"] if g.find("a") elseNoneif title and url and ("gov"in url or"regulatory"in url):
                    results.append({"title": title, "url": url})return json.dumps(results, indent=2)except Exception as e:returnf"SERP search failed: {str(e)}"class IPFLYWebScraperTool(Tool):def__init__(self):super().__init__(
            name="ipfly_web_scraper",
            description="Scrapes regulatory sites with IPFLY proxies. Returns Markdown-formatted content.",
            func=self.run
        )
        self.proxy = os.getenv("IPFLY_PROXY_ENDPOINT")defrun(self, url: str) -> str:
        headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"}try:
            response = requests.get(
                url,
                proxies={"http": self.proxy, "https": self.proxy},
                headers=headers,
                timeout=30)
            response.raise_for_status()
            soup = BeautifulSoup(response.text, "html.parser")for script in soup(["script", "style", "nav", "aside", "footer"]):
                script.decompose()
            text = soup.get_text(strip=True, separator="\n")
            lines = [line.strip() for line in text.split("\n") if line.strip()]
            markdown = "\n\n".join(lines[:50])returnf"Source: {url}\n\n{markdown}"except Exception as e:returnf"Web scraping failed: {str(e)}"# Initialize IPFLY tools
ipfly_serp_tool = IPFLYSERPTool()
ipfly_scraper_tool = IPFLYWebScraperTool()# ------------------------------# LLM Integration# ------------------------------
llm = ChatOpenAI(
    model="gpt-5-mini",
    api_key=os.getenv("OPENAI_API_KEY"))# ------------------------------# Compliance AI Agent Definition# ------------------------------
system_prompt = """
You are a compliance-tracking expert. Analyze internal PDFs for regulatory risks and validate with IPFLY-scraped web data.
1. Identify key privacy/regulatory aspects from the PDF.
2. Generate 2-3 concise SERP queries (max 5 words).
3. Use ipfly_serp_search to find top government/regulatory sites.
4. Use ipfly_web_scraper to extract content from those sites.
5. Create a report with PDF quotes, web insights, and compliance recommendations.
Only use PDF and IPFLY-scraped data—no made-up information.
"""

tools = [ipfly_serp_tool, ipfly_scraper_tool]
agent = create_agent(llm=llm, tools=tools, system_prompt=system_prompt)# ------------------------------# Load PDF & Create Prompt# ------------------------------
os.makedirs("./input", exist_ok=True)
loader = PyPDFDirectoryLoader("./input")
docs = loader.load()
internal_doc = "\n\n".join([doc.page_content for doc in docs])

prompt_template = PromptTemplate.from_template("""
Analyze this internal document for compliance risks and validate with web data:

PDF CONTENT:
{pdf}

Generate a compliance report with PDF quotes and scraped regulatory insights.
""")
prompt = prompt_template.format(pdf=internal_doc)# ------------------------------# Run Agent with Langfuse Tracing# ------------------------------if __name__ == "__main__":print("Running compliance AI agent with Langfuse tracing...")for step in agent.stream({"messages": [{"role": "user", "content": prompt}]},
        stream_mode="values",
        config={"callbacks": [langfuse_handler]}):
        step["messages"][-1].pretty_print()

Step #11: Run the Agent

1.Place a compliance-related PDF (e.g., data-processing-workflow.pdf) in the ./input folder.

2.Execute the agent:

python agent.py

The agent will:

Analyze the PDF to identify regulatory risks (e.g., “data retention”).

Use IPFLY’s SERP tool to search for updated rules (e.g., “GDPR data retention”).

Scrape top government sites (e.g., europa.eu) with IPFLY’s web scraper.

Generate a compliance report with citations.

Langfuse will automatically track every step—from IPFLY proxy calls to LLM outputs.

Step #12: Inspect Agent Traces in Langfuse

1.Log into your Langfuse dashboard.

2.Navigate to the “Tracing” tab—you’ll see a new trace for your agent run.

3.Click the trace to explore:

Tool Calls: View IPFLY SERP/scraper requests, including proxy usage and response data.
LLM Interactions: Inspect prompts, outputs, and latency.
Metrics: Track scraping success rates, LLM costs, and total runtime.

Key insights from Langfuse:

Verify IPFLY proxy performance (e.g., 100% success rate for scraping government sites).

Identify bottlenecks (e.g., latency in SERP searches—adjust IPFLY proxy type to data center for speed).

Audit compliance (e.g., confirm the agent only used IPFLY-scraped government data).

Next Steps to Enhance the Agent

1.Prompt Management: Use Langfuse’s prompt library to version-control compliance prompts.

2.Custom Langfuse Dashboards: Track IPFLY proxy success rates, LLM costs, and compliance report quality.

3.IPFLY Proxy Optimization: Use static residential proxies for recurring scrapes (e.g., monthly GDPR updates) to improve consistency.

4.Report Export: Add logic to save compliance reports as PDFs for audits.

5.Multi-Region Support: Use IPFLY’s regional IPs to scrape regulations for multiple countries (e.g., CCPA for US, PIPEDA for Canada).

Integrating Langfuse with a LangChain AI agent powered by IPFLY proxies delivers enterprise-grade observability and reliability—critical for compliance use cases. Langfuse provides the transparency to track every agent action, while IPFLY ensures unrestricted access to high-quality regulatory data.

Together, these tools solve the biggest challenges of enterprise AI agents:

Data Access: IPFLY’s 90M+ global proxies bypass blocks and geo-restrictions.

Observability: Langfuse traces every step for audits and optimization.

Compliance: Immutable records of data sources and agent logic.

Whether you’re building compliance agents, market research tools, or customer support bots, IPFLY + Langfuse + LangChain creates a stack that’s powerful, transparent, and scalable.

Ready to build your own observable AI agent? Start with IPFLY’s free trial, Langfuse’s free tier, and the code from this guide—unlock the full potential of web data for enterprise AI.

END