Evaluator¶

`relai.critico.evaluate.Evaluator(name, required_fields=None, transform=None, **hyperparameters)` ¶

Bases: ABC

Abstract base class for defining and implementing evaluators for a benchmark.

Evaluators are responsible for assessing an AI agent's response to a specific benchmark sample. They can define required input fields from the AgentLog necessary for their evaluation logic and may incorporate customizable hyperparameters to tune their behavior.

Subclasses must implement the compute_evaluator_result method to define their specific evaluation logic, which produces an EvaluatorResponse object.

Attributes:

Name	Type	Description
`name`	`str`	The name of the evaluator, used for identification.
`required_fields`	`list[str]`	A list of field names (keys) that must be present in either `agent_inputs` (of the sample), `eval_inputs` (of the sample), or `agent_outputs` (of the agent log).
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.
`hyperparameters`	`dict[str, Any]`	A dictionary of arbitrary keyword arguments passed during initialization, allowing for custom configuration of the evaluator's behavior.

Parameters:

Name	Type	Description	Default
`name`	`str`	The display name of the evaluator, used to identify the evaluator in evaluation results.	required
`required_fields`	`list[str]`	A list of field names (keys) that must be present in either `agent_inputs` (of the sample), `eval_inputs` (of the sample), or `agent_outputs` (of the agent log).	`None`
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`
`hyperparameters`	`dict[str, Any]`	A dictionary of arbitrary keyword arguments passed during initialization, allowing for custom configuration of the evaluator's behavior.	`{}`

`uid` `cached` `property` ¶

Generates a unique identifier for this specific evaluator instance. The UID is constructed from the evaluator's class name combined with a JSON-serialized representation of its hyperparameters.

Returns:

Name	Type	Description
`str`	`str`	A unique identifier for the evaluator.

`call(agent_log)` `async` ¶

Executes the evaluation process for a given AI agent log.

Parameters:

Name	Type	Description	Default
`agent_log`	`AgentLog`	The response from the AI agent to be evaluated.	required

Returns:

Name	Type	Description
`EvaluatorResponse`	`EvaluatorLog`	The structured result of the evaluation, including any computed `score` and `feedback`, as defined by the concrete evaluator.

Raises:

Type	Description
`TypeError`	If `agent_log` is not an instance of `AgentLog` or `agent_outputs` (after transform) in agent_log is not a dict.
`ValueError`	If any `required_fields` are missing from the `agent_log`.

`hash()` ¶

Computes the hash value for the evaluator based on its unique identifier (uid).

Returns:

Name	Type	Description
`int`	`int`	The hash value of the evaluator's unique identifier.

`compute_evaluator_result(agent_log)` `abstractmethod` `async` ¶

Abstract method: Computes the evaluation result for an agent log.

Concrete subclasses must implement this method to define their unique evaluation logic. This method should process the AgentLog by accessing its sample and agent_outputs to derive the evaluation outcome, which is then encapsulated in an EvaluatorResponse object.

Parameters:

Name	Type	Description	Default
`agent_log`	`AgentLog`	The comprehensive response from the AI agent, including the original sample and agent's outputs.	required

Returns:

Name	Type	Description
`EvaluatorResponse`	`EvaluatorLog`	An instance of `EvaluatorResponse` containing the evaluation outcome, typically including a `score` and/or `feedback`, along with `evaluator_name`, `evaluator_configuration`, and the original `agent_log`.

Raises:

Type	Description
`NotImplementedError`	If a concrete subclass does not override this method.

`relai.critico.evaluate.RELAIEvaluator(client, relai_evaluator_name, name, required_fields=None, transform=None, **hyperparameters)` ¶

Bases: Evaluator

Base class for all RELAI evaluators that use the RELAI API to evaluate responses on a benchmark.

Attributes:

Name	Type	Description
`name`	`str`	The name of the specific evaluator to be invoked on the RELAI platform.

Initializes a new RELAIEvaluator instance.

Parameters:

Name	Type	Description	Default
`client`	`AsyncRELAI`	An instance of the AsyncRELAI client to interact with the RELAI platform.	required
`relai_evaluator_name`	`str`	The name of the RELAI evaluator to be used for evaluation.	required
`name`	`str`	The display name of the evaluator, used to identify the evaluator in evaluation results.	required
`required_fields`	`list[str]`	A list of field names that must be present in the `AgentLog` (across agent inputs, eval inputs, or agent outputs). Defaults to an empty list.	`None`
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`
`**hyperparameters`	`Any`	Arbitrary keyword arguments passed to the base `Evaluator` class and also forwarded to the RELAI evaluator.	`{}`

`compute_evaluator_result(agent_log)` `async` ¶

Computes the structured evaluation result by invoking a RELAI evaluator.

Parameters:

Name	Type	Description	Default
`agent_log`	`AgentLog`	The response from the AI agent.	required

Returns:

Name	Type	Description
`EvaluatorResponse`	`EvaluatorLog`	A structured evaluation result containing the evaluator's unique ID, the original agent log, and an optional `score` and `feedback` computed by the RELAI evaluator.

`relai.critico.evaluate.RELAILengthEvaluator(client, measure='words', use_ratio=False, acceptable_range=None, target_ratio=None, slope=1.0, temperature=1.0, transform=None)` ¶

Bases: RELAIEvaluator

Evaluator to assess the length of generated text (e.g., summaries) using a RELAI evaluator. Supports evaluating length in as measured by number of characters, words, or sentences, or based on the compression ratio.

Required fields:

- `source`: The original text or document from which the summary is derived.
- `summary`: The generated summary to be evaluated.

Parameters:

Name	Type	Description	Default
`client`	`AsyncRELAI`	An instance of the AsyncRELAI client to interact with the RELAI platform.	required
`measure`	`str`	The unit for length calculation; one of: - 'characters': count every character, - 'words': split on whitespace, - 'sentences': split on sentence-ending punctuation (., !, ?). Defaults to 'words'.	`'words'`
`use_ratio`	`bool`	If True, ignore `acceptable_range` and instead evaluate the length based on the compression ratio: `1 - (summary_length / source_length)` relative to `target_ratio`. Defaults to False.	`False`
`acceptable_range`	`tuple[int, int]`	A two-element tuple `(min_len, max_len)` specifying the inclusive bounds for the length of `summary` under the chosen `measure`. Required if `use_ratio` is False. Ignored if `use_ratio` is True. Defaults to None.	`None`
`target_ratio`	`float`	The desired summary-to-source length ratio (between 0 and 1). Required if `use_ratio` is True. Defaults to None.	`None`
`slope`	`float`	A factor in [0, 1] controlling the penalty slope for summaries shorter than the lower bound. A slope of 1.0 yields a linear ramp from 0 at zero length to 1.0 at `min_len`. Defaults to 1.0.	`1.0`
`temperature`	`float`	A positive scaling factor that smooths the exponential penalty for summaries exceeding the upper bound. Higher values make the penalty curve flatter. Defaults to 1.0.	`1.0`
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`

Raises:

Type	Description
`ValueError`	If any of the parameters are invalid.

`relai.critico.evaluate.RELAIContentEvaluator(client, transform=None)` ¶

Bases: RELAIEvaluator

Evaluator for assessing the factual content of a generated summary against provided key facts, using a RELAI evaluator.

Required fields:

- `key_facts`: A dictionary of key facts with their associated weights, which the summary should cover.
- `summary`: The generated summary to be evaluated.

Parameters:

Name	Type	Description	Default
`client`	`AsyncRELAI`	An instance of the AsyncRELAI client to interact with the RELAI platform.	required
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`

`relai.critico.evaluate.RELAIHallucinationEvaluator(client, transform=None)` ¶

Bases: RELAIEvaluator

Evaluator for detecting factual inconsistencies or "hallucinations" in generated text (e.g., summaries) relative to a source document, using a RELAI evaluator.

Required fields:

- `source`: The original text or document from which the summary is derived.
- `summary`: The generated summary to be evaluated.

Parameters:

Name	Type	Description	Default
`client`	`AsyncRELAI`	An instance of the AsyncRELAI client to interact with the RELAI platform.	required
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`

`relai.critico.evaluate.RELAIStyleEvaluator(client, transform=None)` ¶

Bases: RELAIEvaluator

Evaluator for assessing the stylistic adherence of a generated summary based on provided rubrics, using a RELAI evaluator.

Required fields:

- `style_rubrics`: A dictionary of style rubrics with their associated weights,
    which the summary should adhere to.
- `summary`: The generated summary to be evaluated.

Parameters:

Name	Type	Description	Default
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`

`relai.critico.evaluate.RELAIFormatEvaluator(client, transform=None)` ¶

Bases: RELAIEvaluator

Evaluator for assessing the formatting adherence of a generated summary based on provided rubrics, using a RELAI evaluator.

Required fields:

- `format_rubrics`: A dictionary of format rubrics with their associated weights,
    which the summary should adhere to.
- `summary`: The generated summary to be evaluated.

Parameters:

Name	Type	Description	Default
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`

`relai.critico.evaluate.RELAIRubricBasedEvaluator(client, transform=None)` ¶

Bases: RELAIEvaluator

Evaluator for performing a detailed, rubric-driven assessment of an AI agent's response to a query using an LLM-based evaluator on the RELAI platform.

Required fields:

- `question`: The question or prompt that the AI agent was asked to respond to.
- `answer`: The AI agent's generated response to the question.
- `rubrics`: A dictionary of evaluation criteria with their associated weights,
    which the answer should satisfy.
- `std_answer`: The standard or expected answer against which the AI agent's response is evaluated.

Parameters:

Name	Type	Description	Default
`client`	`AsyncRELAI`	An instance of the AsyncRELAI client to interact with the RELAI platform.	required
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`

`relai.critico.evaluate.RELAIAnnotationEvaluator(client, transform=None)` ¶

Bases: RELAIEvaluator

Evaluator for assessing agent logs based on past human preference annotations provided through the RELAI platform.

Required fields:

- `all_inputs`: The full set of inputs originally supplied to the agent.
- `previous_outputs`: Prior agent output(s) shown to the human annotator.
- `desired_outputs`: Human-preferred or target outputs provided by the annotator.
- `feedback`: Free-text human feedback or rationale provided by the annotator.
- `liked`: A flag indicating whether the annotator liked the agent's output.

Parameters:

Name	Type	Description	Default
`client`	`AsyncRELAI`	An instance of the AsyncRELAI client to interact with the RELAI platform.	required
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`

`relai.critico.evaluate.RELAICustomEvaluator(evaluator_id, model_name='gpt-5-mini', model=None, transform=None)` ¶

Bases: Evaluator

Evaluator for assessing agent logs based on the custom evaluator prompt, input and output formats defined on the RELAI platform.

Required fields:

- Any fields specified in the custom evaluator's input format (on the platform).

Parameters:

Name	Type	Description	Default
`evaluator_id`	`str`	The unique identifier of the custom evaluator defined on the RELAI platform.	required
`model_name`	`str`	The name of the model to use for the evaluator. Defaults to "gpt-5-mini".	`'gpt-5-mini'`
`model`	`LitellmModel \| None`	An optional LitellmModel (from agents.extensions.models.litellm_model import LitellmModel) instance to use for the evaluator. If provided, it overrides the `model_name`. For a full list of models supported in LiteLLM, see https://docs.litellm.ai/docs/providers. Defaults to None.	`None`
`transform`	`Callable`	An optional callable to transform (pre-process) the `agent_outputs` of the agent response for the evaluator. Defaults to None.	`None`

Evaluator¶

relai.critico.evaluate.Evaluator(name, required_fields=None, transform=None, **hyperparameters) ¶

uid cached property ¶

__call__(agent_log) async ¶

__hash__() ¶

compute_evaluator_result(agent_log) abstractmethod async ¶

relai.critico.evaluate.RELAIEvaluator(client, relai_evaluator_name, name, required_fields=None, transform=None, **hyperparameters) ¶

compute_evaluator_result(agent_log) async ¶

relai.critico.evaluate.RELAILengthEvaluator(client, measure='words', use_ratio=False, acceptable_range=None, target_ratio=None, slope=1.0, temperature=1.0, transform=None) ¶

relai.critico.evaluate.RELAIContentEvaluator(client, transform=None) ¶

relai.critico.evaluate.RELAIHallucinationEvaluator(client, transform=None) ¶

relai.critico.evaluate.RELAIStyleEvaluator(client, transform=None) ¶

relai.critico.evaluate.RELAIFormatEvaluator(client, transform=None) ¶

relai.critico.evaluate.RELAIRubricBasedEvaluator(client, transform=None) ¶

relai.critico.evaluate.RELAIAnnotationEvaluator(client, transform=None) ¶

relai.critico.evaluate.RELAICustomEvaluator(evaluator_id, model_name='gpt-5-mini', model=None, transform=None) ¶

`relai.critico.evaluate.Evaluator(name, required_fields=None, transform=None, **hyperparameters)` ¶

`uid` `cached` `property` ¶

`call(agent_log)` `async` ¶

`hash()` ¶

`compute_evaluator_result(agent_log)` `abstractmethod` `async` ¶

`relai.critico.evaluate.RELAIEvaluator(client, relai_evaluator_name, name, required_fields=None, transform=None, **hyperparameters)` ¶

`compute_evaluator_result(agent_log)` `async` ¶

`relai.critico.evaluate.RELAILengthEvaluator(client, measure='words', use_ratio=False, acceptable_range=None, target_ratio=None, slope=1.0, temperature=1.0, transform=None)` ¶

`relai.critico.evaluate.RELAIContentEvaluator(client, transform=None)` ¶

`relai.critico.evaluate.RELAIHallucinationEvaluator(client, transform=None)` ¶

`relai.critico.evaluate.RELAIStyleEvaluator(client, transform=None)` ¶

`relai.critico.evaluate.RELAIFormatEvaluator(client, transform=None)` ¶

`relai.critico.evaluate.RELAIRubricBasedEvaluator(client, transform=None)` ¶

`relai.critico.evaluate.RELAIAnnotationEvaluator(client, transform=None)` ¶

`relai.critico.evaluate.RELAICustomEvaluator(evaluator_id, model_name='gpt-5-mini', model=None, transform=None)` ¶