Types¶
relai.data.RELAISample(benchmark_id='default', id=(lambda: uuid4().hex)(), split='All', agent_inputs=dict(), extras=dict(), serialized_simulation_config=dict())
dataclass
¶
Represents a single sample in a RELAI benchmark.
Attributes:
| Name | Type | Description |
|---|---|---|
benchmark_id |
str
|
The identifier of the benchmark this sample belongs to. |
id |
str
|
The unique identifier for this sample. |
split |
str
|
The data split this sample belongs to (e.g., "Train", "Validation", "Test"). |
agent_inputs |
AgentInputs
|
The inputs provided to the agent from this sample. |
extras |
Extras
|
Any additional metadata or information associated with this sample. Use this field to also store evaluator-specific inputs. |
serialized_simulation_config |
dict[str, Any]
|
The serialized simulation configuration for this sample. Can be used to reconstruct any mockers used in a previous simulation. |
relai.data.SimulationTape(sample=None, id=(lambda: uuid4().hex)(), simulation_config=dict())
dataclass
¶
A simulation tape records any inputs, outputs, and any other relevant data during a simulation.
Attributes:
| Name | Type | Description |
|---|---|---|
id |
str
|
The unique identifier for this simulation tape. |
benchmark_id |
str
|
The identifier of the benchmark from which the sample used to initialize this tape was taken. If no sample was provided, defaults to "default". |
sample_id |
str
|
The identifier of the sample used to initialize this tape. If no sample was provided, defaults to the tape's id. |
split |
str
|
The data split of the sample used to initialize this tape. If no sample was provided, defaults to "All". |
agent_inputs |
AgentInputs
|
The inputs provided to the agent from the sample used to initialize this tape. If no sample was provided, defaults to an empty dictionary. |
extras |
Extras
|
Any additional metadata or information associated with this tape. If no sample was provided, defaults to an empty dictionary. |
evaluator_group |
str
|
The evaluator group associated with this tape, typically set to the benchmark_id of the
sample. If no sample was provided, defaults to "default". Can be modified in the |
simulation_config |
SimulationConfigT
|
The simulation configuration used during this simulation which is a mapping from qualified function names to their respective mocker instances (Persona, MockTool etc.,). |
relai.data.AgentLog(simulation_tape, agent_outputs=dict(), trace_id=None)
dataclass
¶
Log of a single agent simulation run.
Attributes:
| Name | Type | Description |
|---|---|---|
simulation_tape |
SimulationTape
|
The simulation tape containing inputs and metadata. |
agent_outputs |
AgentOutputs
|
The outputs generated by the agent during the simulation. |
trace_id |
str | None
|
An optional trace identifier for the simulation run. |
relai.data.EvaluatorLog(evaluator_id, name, outputs, config=dict())
dataclass
¶
Log of a single evaluator run.
Attributes:
| Name | Type | Description |
|---|---|---|
evaluator_id |
str
|
The ID of the evaluator. |
name |
str
|
The name of the evaluator. |
outputs |
EvaluatorOutputs
|
The outputs generated by the evaluator. |
config |
dict[str, Any]
|
The configuration settings used for the evaluator. |
relai.data.CriticoLog(agent_log, evaluator_logs=list(), aggregate_score=0.0, aggregate_feedback='', trace_id=None)
dataclass
¶
Log of a Critico evaluation run.
Attributes:
| Name | Type | Description |
|---|---|---|
agent_log |
AgentLog
|
The log of the agent simulation run. |
evaluator_logs |
list[EvaluatorLog]
|
A list of logs from individual evaluators. |
aggregate_score |
float
|
The aggregate score computed from all the evaluator logs. |
aggregate_feedback |
str
|
The aggregate feedback compiled from all the evaluator logs. |
trace_id |
str | None
|
An optional trace identifier for the corresponding agent simulation run. |