Types¶

`relai.data.RELAISample(benchmark_id='default', id=(lambda: uuid4().hex)(), split='All', agent_inputs=dict(), extras=dict(), serialized_simulation_config=dict())` `dataclass` ¶

Represents a single sample in a RELAI benchmark.

Attributes:

Name	Type	Description
`benchmark_id`	`str`	The identifier of the benchmark this sample belongs to.
`id`	`str`	The unique identifier for this sample.
`split`	`str`	The data split this sample belongs to (e.g., "Train", "Validation", "Test").
`agent_inputs`	`AgentInputs`	The inputs provided to the agent from this sample.
`extras`	`Extras`	Any additional metadata or information associated with this sample. Use this field to also store evaluator-specific inputs.
`serialized_simulation_config`	`dict[str, Any]`	The serialized simulation configuration for this sample. Can be used to reconstruct any mockers used in a previous simulation.

A simulation tape records any inputs, outputs, and any other relevant data during a simulation.

Attributes:

Name	Type	Description
`id`	`str`	The unique identifier for this simulation tape.
`benchmark_id`	`str`	The identifier of the benchmark from which the sample used to initialize this tape was taken. If no sample was provided, defaults to "default".
`sample_id`	`str`	The identifier of the sample used to initialize this tape. If no sample was provided, defaults to the tape's id.
`split`	`str`	The data split of the sample used to initialize this tape. If no sample was provided, defaults to "All".
`agent_inputs`	`AgentInputs`	The inputs provided to the agent from the sample used to initialize this tape. If no sample was provided, defaults to an empty dictionary.
`extras`	`Extras`	Any additional metadata or information associated with this tape. If no sample was provided, defaults to an empty dictionary.
`evaluator_group`	`str`	The evaluator group associated with this tape, typically set to the benchmark_id of the sample. If no sample was provided, defaults to "default". Can be modified in the `agent_fn`.
`simulation_config`	`SimulationConfigT`	The simulation configuration used during this simulation which is a mapping from qualified function names to their respective mocker instances (Persona, MockTool etc.,).

Log of a single agent simulation run.

Attributes:

Name	Type	Description
`simulation_tape`	`SimulationTape`	The simulation tape containing inputs and metadata.
`agent_outputs`	`AgentOutputs`	The outputs generated by the agent during the simulation.
`trace_id`	`str \| None`	An optional trace identifier for the simulation run.

Log of a single evaluator run.

Attributes:

Name	Type	Description
`evaluator_id`	`str`	The ID of the evaluator.
`name`	`str`	The name of the evaluator.
`outputs`	`EvaluatorOutputs`	The outputs generated by the evaluator.
`config`	`dict[str, Any]`	The configuration settings used for the evaluator.

Log of a Critico evaluation run.

Attributes:

Name	Type	Description
`agent_log`	`AgentLog`	The log of the agent simulation run.
`evaluator_logs`	`list[EvaluatorLog]`	A list of logs from individual evaluators.
`aggregate_score`	`float`	The aggregate score computed from all the evaluator logs.
`aggregate_feedback`	`str`	The aggregate feedback compiled from all the evaluator logs.
`trace_id`	`str \| None`	An optional trace identifier for the corresponding agent simulation run.