Skip to content

Evaluator

relai.critico.evaluate.Evaluator(name, required_fields=None, transform=None, **hyperparameters)

Bases: ABC

Abstract base class for defining and implementing evaluators for a benchmark.

Evaluators are responsible for assessing an AI agent's response to a specific benchmark sample. They can define required input fields from the AgentLog necessary for their evaluation logic and may incorporate customizable hyperparameters to tune their behavior.

Subclasses must implement the compute_evaluator_result method to define their specific evaluation logic, which produces an EvaluatorResponse object.

Attributes:

Name Type Description
name str

The name of the evaluator, used for identification.

required_fields list[str]

A list of field names (keys) that must be present in either agent_inputs (of the sample), eval_inputs (of the sample), or agent_outputs (of the agent log).

transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

hyperparameters dict[str, Any]

A dictionary of arbitrary keyword arguments passed during initialization, allowing for custom configuration of the evaluator's behavior.

Parameters:

Name Type Description Default
name str

The display name of the evaluator, used to identify the evaluator in evaluation results.

required
required_fields list[str]

A list of field names (keys) that must be present in either agent_inputs (of the sample), eval_inputs (of the sample), or agent_outputs (of the agent log).

None
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None
hyperparameters dict[str, Any]

A dictionary of arbitrary keyword arguments passed during initialization, allowing for custom configuration of the evaluator's behavior.

{}

uid cached property

Generates a unique identifier for this specific evaluator instance. The UID is constructed from the evaluator's class name combined with a JSON-serialized representation of its hyperparameters.

Returns:

Name Type Description
str str

A unique identifier for the evaluator.

__call__(agent_log) async

Executes the evaluation process for a given AI agent log.

Parameters:

Name Type Description Default
agent_log AgentLog

The response from the AI agent to be evaluated.

required

Returns:

Name Type Description
EvaluatorResponse EvaluatorLog

The structured result of the evaluation, including any computed score and feedback, as defined by the concrete evaluator.

Raises:

Type Description
TypeError

If agent_log is not an instance of AgentLog or agent_outputs (after transform) in agent_log is not a dict.

ValueError

If any required_fields are missing from the agent_log.

__hash__()

Computes the hash value for the evaluator based on its unique identifier (uid).

Returns:

Name Type Description
int int

The hash value of the evaluator's unique identifier.

compute_evaluator_result(agent_log) abstractmethod async

Abstract method: Computes the evaluation result for an agent log.

Concrete subclasses must implement this method to define their unique evaluation logic. This method should process the AgentLog by accessing its sample and agent_outputs to derive the evaluation outcome, which is then encapsulated in an EvaluatorResponse object.

Parameters:

Name Type Description Default
agent_log AgentLog

The comprehensive response from the AI agent, including the original sample and agent's outputs.

required

Returns:

Name Type Description
EvaluatorResponse EvaluatorLog

An instance of EvaluatorResponse containing the evaluation outcome, typically including a score and/or feedback, along with evaluator_name, evaluator_configuration, and the original agent_log.

Raises:

Type Description
NotImplementedError

If a concrete subclass does not override this method.

relai.critico.evaluate.RELAIEvaluator(client, relai_evaluator_name, name, required_fields=None, transform=None, **hyperparameters)

Bases: Evaluator

Base class for all RELAI evaluators that use the RELAI API to evaluate responses on a benchmark.

Attributes:

Name Type Description
name str

The name of the specific evaluator to be invoked on the RELAI platform.

Initializes a new RELAIEvaluator instance.

Parameters:

Name Type Description Default
client AsyncRELAI

An instance of the AsyncRELAI client to interact with the RELAI platform.

required
relai_evaluator_name str

The name of the RELAI evaluator to be used for evaluation.

required
name str

The display name of the evaluator, used to identify the evaluator in evaluation results.

required
required_fields list[str]

A list of field names that must be present in the AgentLog (across agent inputs, eval inputs, or agent outputs). Defaults to an empty list.

None
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None
**hyperparameters Any

Arbitrary keyword arguments passed to the base Evaluator class and also forwarded to the RELAI evaluator.

{}

compute_evaluator_result(agent_log) async

Computes the structured evaluation result by invoking a RELAI evaluator.

Parameters:

Name Type Description Default
agent_log AgentLog

The response from the AI agent.

required

Returns:

Name Type Description
EvaluatorResponse EvaluatorLog

A structured evaluation result containing the evaluator's unique ID, the original agent log, and an optional score and feedback computed by the RELAI evaluator.

relai.critico.evaluate.RELAILengthEvaluator(client, measure='words', use_ratio=False, acceptable_range=None, target_ratio=None, slope=1.0, temperature=1.0, transform=None)

Bases: RELAIEvaluator

Evaluator to assess the length of generated text (e.g., summaries) using a RELAI evaluator. Supports evaluating length in as measured by number of characters, words, or sentences, or based on the compression ratio.

Required fields:

- `source`: The original text or document from which the summary is derived.
- `summary`: The generated summary to be evaluated.

Parameters:

Name Type Description Default
client AsyncRELAI

An instance of the AsyncRELAI client to interact with the RELAI platform.

required
measure str

The unit for length calculation; one of: - 'characters': count every character, - 'words': split on whitespace, - 'sentences': split on sentence-ending punctuation (., !, ?). Defaults to 'words'.

'words'
use_ratio bool

If True, ignore acceptable_range and instead evaluate the length based on the compression ratio: 1 - (summary_length / source_length) relative to target_ratio. Defaults to False.

False
acceptable_range tuple[int, int]

A two-element tuple (min_len, max_len) specifying the inclusive bounds for the length of summary under the chosen measure. Required if use_ratio is False. Ignored if use_ratio is True. Defaults to None.

None
target_ratio float

The desired summary-to-source length ratio (between 0 and 1). Required if use_ratio is True. Defaults to None.

None
slope float

A factor in [0, 1] controlling the penalty slope for summaries shorter than the lower bound. A slope of 1.0 yields a linear ramp from 0 at zero length to 1.0 at min_len. Defaults to 1.0.

1.0
temperature float

A positive scaling factor that smooths the exponential penalty for summaries exceeding the upper bound. Higher values make the penalty curve flatter. Defaults to 1.0.

1.0
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None

Raises:

Type Description
ValueError

If any of the parameters are invalid.

relai.critico.evaluate.RELAIContentEvaluator(client, transform=None)

Bases: RELAIEvaluator

Evaluator for assessing the factual content of a generated summary against provided key facts, using a RELAI evaluator.

Required fields:

- `key_facts`: A dictionary of key facts with their associated weights, which the summary should cover.
- `summary`: The generated summary to be evaluated.

Parameters:

Name Type Description Default
client AsyncRELAI

An instance of the AsyncRELAI client to interact with the RELAI platform.

required
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None

relai.critico.evaluate.RELAIHallucinationEvaluator(client, transform=None)

Bases: RELAIEvaluator

Evaluator for detecting factual inconsistencies or "hallucinations" in generated text (e.g., summaries) relative to a source document, using a RELAI evaluator.

Required fields:

- `source`: The original text or document from which the summary is derived.
- `summary`: The generated summary to be evaluated.

Parameters:

Name Type Description Default
client AsyncRELAI

An instance of the AsyncRELAI client to interact with the RELAI platform.

required
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None

relai.critico.evaluate.RELAIStyleEvaluator(client, transform=None)

Bases: RELAIEvaluator

Evaluator for assessing the stylistic adherence of a generated summary based on provided rubrics, using a RELAI evaluator.

Required fields:

- `style_rubrics`: A dictionary of style rubrics with their associated weights,
    which the summary should adhere to.
- `summary`: The generated summary to be evaluated.

Parameters:

Name Type Description Default
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None

relai.critico.evaluate.RELAIFormatEvaluator(client, transform=None)

Bases: RELAIEvaluator

Evaluator for assessing the formatting adherence of a generated summary based on provided rubrics, using a RELAI evaluator.

Required fields:

- `format_rubrics`: A dictionary of format rubrics with their associated weights,
    which the summary should adhere to.
- `summary`: The generated summary to be evaluated.

Parameters:

Name Type Description Default
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None

relai.critico.evaluate.RELAIRubricBasedEvaluator(client, transform=None)

Bases: RELAIEvaluator

Evaluator for performing a detailed, rubric-driven assessment of an AI agent's response to a query using an LLM-based evaluator on the RELAI platform.

Required fields:

- `question`: The question or prompt that the AI agent was asked to respond to.
- `answer`: The AI agent's generated response to the question.
- `rubrics`: A dictionary of evaluation criteria with their associated weights,
    which the answer should satisfy.
- `std_answer`: The standard or expected answer against which the AI agent's response is evaluated.

Parameters:

Name Type Description Default
client AsyncRELAI

An instance of the AsyncRELAI client to interact with the RELAI platform.

required
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None

relai.critico.evaluate.RELAIAnnotationEvaluator(client, transform=None)

Bases: RELAIEvaluator

Evaluator for assessing agent logs based on past human preference annotations provided through the RELAI platform.

Required fields:

- `all_inputs`: The full set of inputs originally supplied to the agent.
- `previous_outputs`: Prior agent output(s) shown to the human annotator.
- `desired_outputs`: Human-preferred or target outputs provided by the annotator.
- `feedback`: Free-text human feedback or rationale provided by the annotator.
- `liked`: A flag indicating whether the annotator liked the agent's output.

Parameters:

Name Type Description Default
client AsyncRELAI

An instance of the AsyncRELAI client to interact with the RELAI platform.

required
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None

relai.critico.evaluate.RELAICustomEvaluator(evaluator_id, model_name='gpt-5-mini', transform=None)

Bases: Evaluator

Evaluator for assessing agent logs based on the custom evaluator prompt, input and output formats defined on the RELAI platform.

Required fields:

- Any fields specified in the custom evaluator's input format (on the platform).

Parameters:

Name Type Description Default
evaluator_id str

The unique identifier of the custom evaluator defined on the RELAI platform.

required
model_name str

The name of the model to use for the evaluator. Defaults to "gpt-5-mini".

'gpt-5-mini'
transform Callable

An optional callable to transform (pre-process) the agent_outputs of the agent response for the evaluator. Defaults to None.

None