RELAI: Simulate → Evaluate → Optimize AI Agents

RELAI is an SDK for building reliable AI agents. It streamlines the hardest parts of agent development—simulation, evaluation, and optimization—so you can iterate quickly with confidence.

What you get

Agent Simulation — Create full/partial environments, define LLM personas, mock MCP servers & tools, and generate synthetic data. Optionally condition simulation on real samples to better match production.
Agent Evaluation — Mix code-based and LLM-based custom evaluators or use RELAI platform evaluators. Turn human reviews into benchmarks you can re-run.
Agent Optimization (Maestro) — Holistic optimizer that uses evaluator signals & feedback to improve prompts/configs and suggest graph-level changes. Also selects best model/tool/graph based on observed performance.

Quick Links¶

Get up and running with RELAI in minutes:

Getting Started - Installation, setup and code walkthrough
Tutorials - Tutorials that show how to achieve a specific task or using a particular feature
Examples - Self-contained examples illustrating using the SDK in various common scenarios
Notebooks - Jupyter notebooks illustrating using the SDK in various common scenarios
API Reference - Detailed reference to the SDK API

Getting Started¶

Installation¶

You can install the RELAI SDK with using your favorite Python package manager (requires Python 3.9+):

pip install relai
# or
uv add relai

Setting the RELAI API key¶

A RELAI API key is necessary to use features from the RELAI platform. You can get a RELAI API key from your RELAI enterprise dashboard. After you copy the key, assign to RELAI_API_KEY environment variable:

export RELAI_API_KEY="relai-..."

Building Reliable AI Agents with RELAI SDK¶

Step 1 — Decorate inputs/tools that will be simulated¶

from relai.mocker import Persona, MockTool
from relai.simulator import simulated
from agents import function_tool

AGENT_NAME = "Stock Chatbot"
MODEL = "gpt-5-mini"


# Decorate functions to be mocked in the simulation
@simulated
async def get_user_query() -> str:
    """Get user's query about stock prices."""
    return "What is the current price of AAPL stock?"


@function_tool
@simulated
async def retriever(query: str) -> list[str]:
    """
    A retriever tool that returns relevant financial data for a given query about stock prices.
    """
    return []

Step 2 — Register params to be optimized and define your agent¶

from agents import Agent, Runner
from relai.maestro import params, register_param

register_param(
    "prompt",
    type="prompt",
    init_value="You are a helpful assistant for stock price questions.",
    desc="system prompt for the agent",
)

async def stock_price_chatbot(question: str) -> dict[str, str]:
    agent = Agent(
        name=AGENT_NAME,
        instructions=params.prompt,  # access registered parameter
        model=MODEL,
        tools=[retriever],
    )
    result = await Runner.run(agent, question)
    return {"answer": result.final_output}

Step 3 — Wrap agent for simulation traces¶

from relai import AgentOutputs, SimulationTape

async def agent_fn(tape: SimulationTape) -> AgentOutputs:
    question = await get_user_query()
    tape.agent_inputs["question"] = question  # trace inputs for later auditing
    return await stock_price_chatbot(question)

Step 4 - Define evaluators¶

import re
from relai import AgentLog, EvaluatorLog
from relai.critico.evaluate import Evaluator

class PriceFormatEvaluator(Evaluator):
    """Checks for correct price formats ($… with exactly two decimals)."""

    def __init__(self) -> None:
        super().__init__(name="PriceFormatEvaluator", required_fields=["answer"])

    async def compute_evaluator_result(self, agent_log: AgentLog) -> EvaluatorLog:
        # flag $-prices that are NOT like $1,234.56 or $1234.56
        bad_pattern = r"\$(?!\d{1,3}(?:,\d{3})+|\d+\.\d{2}\b)\S+"
        bad_prices = re.findall(bad_pattern, agent_log.agent_outputs["answer"])
        score = 0.0 if bad_prices else 1.0
        feedback = (
            ("Incorrect price formats found: " + ", ".join(bad_prices))
            if bad_prices else
            "Price formats look good."
        )
        return EvaluatorLog(
            evaluator_id=self.uid,
            name=self.name,
            outputs={"score": score, "feedback": feedback},
        )

Step 5 - Orchestrate: simulate → evaluate → optimize¶

import asyncio

from relai import AsyncRELAI, AsyncSimulator, random_env_generator
from relai.critico import Critico
from relai.maestro import Maestro, params
from relai.mocker import Persona, MockTool  # (already imported in Step 1 if single file)

async def main() -> None:
    # 5.1 — Set up your simulation environment
    env_generator = random_env_generator(
        config_set={
            "__main__.get_user_query": [Persona(user_persona="A polite and curious user.")],
            "__main__.retriever": [MockTool(model=MODEL)],
        }
    )

    async with AsyncRELAI() as client:
        # 5.2 — SIMULATE
        simulator = AsyncSimulator(agent_fn=agent_fn, env_generator=env_generator, client=client)
        agent_logs = await simulator.run(num_runs=1)

        # 5.3 — EVALUATE
        critico = Critico(client=client)
        critico.add_evaluators({PriceFormatEvaluator(): 1.0})
        critico_logs = await critico.evaluate(agent_logs)

        # Publish evaluation report to the RELAI platform
        await critico.report(critico_logs)

        # 5.4 — OPTIMIZE with Maestro
        maestro = Maestro(client=client, agent_fn=agent_fn, log_to_platform=True, name=AGENT_NAME)
        maestro.add_setup(simulator=simulator, critico=critico)

        # 5.4.1 — Optimize agent configurations
        # params.load("saved_config.json")  # load previous params if available
        await maestro.optimize_config(
            total_rollouts=50,
            batch_size=2,
            explore_radius=5,
            explore_factor=0.5,
            verbose=True,
        )
        params.save("saved_config.json")  # save optimized params for future usage

        # 5.4.2 — Optimize agent structure
        await maestro.optimize_structure(
            total_rollouts=10,
            code_paths=["agentic-rag.py"],
            verbose=True,
        )

if __name__ == "__main__":
    asyncio.run(main())