Skip to content

Types

relai.data.RELAISample(benchmark_id='default', id=(lambda: uuid4().hex)(), split='All', agent_inputs=dict(), extras=dict(), serialized_simulation_config=dict()) dataclass

Represents a single sample in a RELAI benchmark.

Attributes:

Name Type Description
benchmark_id str

The identifier of the benchmark this sample belongs to.

id str

The unique identifier for this sample.

split str

The data split this sample belongs to (e.g., "Train", "Validation", "Test").

agent_inputs AgentInputs

The inputs provided to the agent from this sample.

extras Extras

Any additional metadata or information associated with this sample. Use this field to also store evaluator-specific inputs.

serialized_simulation_config dict[str, Any]

The serialized simulation configuration for this sample. Can be used to reconstruct any mockers used in a previous simulation.

relai.data.SimulationTape(sample=None, id=(lambda: uuid4().hex)(), simulation_config=dict()) dataclass

A simulation tape records any inputs, outputs, and any other relevant data during a simulation.

Attributes:

Name Type Description
id str

The unique identifier for this simulation tape.

benchmark_id str

The identifier of the benchmark from which the sample used to initialize this tape was taken. If no sample was provided, defaults to "default".

sample_id str

The identifier of the sample used to initialize this tape. If no sample was provided, defaults to the tape's id.

split str

The data split of the sample used to initialize this tape. If no sample was provided, defaults to "All".

agent_inputs AgentInputs

The inputs provided to the agent from the sample used to initialize this tape. If no sample was provided, defaults to an empty dictionary.

extras Extras

Any additional metadata or information associated with this tape. If no sample was provided, defaults to an empty dictionary.

evaluator_group str

The evaluator group associated with this tape, typically set to the benchmark_id of the sample. If no sample was provided, defaults to "default". Can be modified in the agent_fn.

simulation_config SimulationConfigT

The simulation configuration used during this simulation which is a mapping from qualified function names to their respective mocker instances (Persona, MockTool etc.,).

relai.data.AgentLog(simulation_tape, agent_outputs=dict(), trace_id=None) dataclass

Log of a single agent simulation run.

Attributes:

Name Type Description
simulation_tape SimulationTape

The simulation tape containing inputs and metadata.

agent_outputs AgentOutputs

The outputs generated by the agent during the simulation.

trace_id str | None

An optional trace identifier for the simulation run.

relai.data.EvaluatorLog(evaluator_id, name, outputs, config=dict()) dataclass

Log of a single evaluator run.

Attributes:

Name Type Description
evaluator_id str

The ID of the evaluator.

name str

The name of the evaluator.

outputs EvaluatorOutputs

The outputs generated by the evaluator.

config dict[str, Any]

The configuration settings used for the evaluator.

relai.data.CriticoLog(agent_log, evaluator_logs=list(), aggregate_score=0.0, aggregate_feedback='', trace_id=None) dataclass

Log of a Critico evaluation run.

Attributes:

Name Type Description
agent_log AgentLog

The log of the agent simulation run.

evaluator_logs list[EvaluatorLog]

A list of logs from individual evaluators.

aggregate_score float

The aggregate score computed from all the evaluator logs.

aggregate_feedback str

The aggregate feedback compiled from all the evaluator logs.

trace_id str | None

An optional trace identifier for the corresponding agent simulation run.