Evaluator¶
relai.critico.evaluate.Evaluator(name, required_fields=None, transform=None, **hyperparameters)
¶
Bases: ABC
Abstract base class for defining and implementing evaluators for a benchmark.
Evaluators are responsible for assessing an AI agent's response to a specific
benchmark sample. They can define required input fields from the AgentLog
necessary for their evaluation logic and may incorporate customizable hyperparameters
to tune their behavior.
Subclasses must implement the compute_evaluator_result method to define
their specific evaluation logic, which produces an EvaluatorResponse object.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the evaluator, used for identification. |
required_fields |
list[str]
|
A list of field names (keys) that must be present in either |
transform |
Callable
|
An optional callable to transform (pre-process) the |
hyperparameters |
dict[str, Any]
|
A dictionary of arbitrary keyword arguments passed during initialization, allowing for custom configuration of the evaluator's behavior. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The display name of the evaluator, used to identify the evaluator in evaluation results. |
required |
required_fields
|
list[str]
|
A list of field names (keys) that must be present in either |
None
|
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
hyperparameters
|
dict[str, Any]
|
A dictionary of arbitrary keyword arguments passed during initialization, allowing for custom configuration of the evaluator's behavior. |
{}
|
uid
cached
property
¶
Generates a unique identifier for this specific evaluator instance. The UID is constructed from the evaluator's class name combined with a JSON-serialized representation of its hyperparameters.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
A unique identifier for the evaluator. |
__call__(agent_log)
async
¶
Executes the evaluation process for a given AI agent log.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_log
|
AgentLog
|
The response from the AI agent to be evaluated. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
EvaluatorResponse |
EvaluatorLog
|
The structured result of the evaluation, including any computed |
Raises:
| Type | Description |
|---|---|
TypeError
|
If |
ValueError
|
If any |
__hash__()
¶
Computes the hash value for the evaluator based on its unique identifier (uid).
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The hash value of the evaluator's unique identifier. |
compute_evaluator_result(agent_log)
abstractmethod
async
¶
Abstract method: Computes the evaluation result for an agent log.
Concrete subclasses must implement this method to define their unique evaluation logic. This method should
process the AgentLog by accessing its sample and agent_outputs to derive the evaluation outcome,
which is then encapsulated in an EvaluatorResponse object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_log
|
AgentLog
|
The comprehensive response from the AI agent, including the original sample and agent's outputs. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
EvaluatorResponse |
EvaluatorLog
|
An instance of |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If a concrete subclass does not override this method. |
relai.critico.evaluate.RELAIEvaluator(client, relai_evaluator_name, name, required_fields=None, transform=None, **hyperparameters)
¶
Bases: Evaluator
Base class for all RELAI evaluators that use the RELAI API to evaluate responses on a benchmark.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the specific evaluator to be invoked on the RELAI platform. |
Initializes a new RELAIEvaluator instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
AsyncRELAI
|
An instance of the AsyncRELAI client to interact with the RELAI platform. |
required |
relai_evaluator_name
|
str
|
The name of the RELAI evaluator to be used for evaluation. |
required |
name
|
str
|
The display name of the evaluator, used to identify the evaluator in evaluation results. |
required |
required_fields
|
list[str]
|
A list of field names that must be present in the |
None
|
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
**hyperparameters
|
Any
|
Arbitrary keyword arguments passed to the base |
{}
|
compute_evaluator_result(agent_log)
async
¶
Computes the structured evaluation result by invoking a RELAI evaluator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agent_log
|
AgentLog
|
The response from the AI agent. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
EvaluatorResponse |
EvaluatorLog
|
A structured evaluation result containing the evaluator's unique ID, the original
agent log, and an optional |
relai.critico.evaluate.RELAILengthEvaluator(client, measure='words', use_ratio=False, acceptable_range=None, target_ratio=None, slope=1.0, temperature=1.0, transform=None)
¶
Bases: RELAIEvaluator
Evaluator to assess the length of generated text (e.g., summaries) using a RELAI evaluator. Supports evaluating length in as measured by number of characters, words, or sentences, or based on the compression ratio.
Required fields:
- `source`: The original text or document from which the summary is derived.
- `summary`: The generated summary to be evaluated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
AsyncRELAI
|
An instance of the AsyncRELAI client to interact with the RELAI platform. |
required |
measure
|
str
|
The unit for length calculation; one of: - 'characters': count every character, - 'words': split on whitespace, - 'sentences': split on sentence-ending punctuation (., !, ?). Defaults to 'words'. |
'words'
|
use_ratio
|
bool
|
If True, ignore |
False
|
acceptable_range
|
tuple[int, int]
|
A two-element tuple |
None
|
target_ratio
|
float
|
The desired summary-to-source length ratio (between 0 and 1).
Required if |
None
|
slope
|
float
|
A factor in [0, 1] controlling the penalty slope for summaries shorter
than the lower bound. A slope of 1.0 yields a linear ramp from 0 at zero length to 1.0 at
|
1.0
|
temperature
|
float
|
A positive scaling factor that smooths the exponential penalty for summaries exceeding the upper bound. Higher values make the penalty curve flatter. Defaults to 1.0. |
1.0
|
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If any of the parameters are invalid. |
relai.critico.evaluate.RELAIContentEvaluator(client, transform=None)
¶
Bases: RELAIEvaluator
Evaluator for assessing the factual content of a generated summary against provided key facts, using a RELAI evaluator.
Required fields:
- `key_facts`: A dictionary of key facts with their associated weights, which the summary should cover.
- `summary`: The generated summary to be evaluated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
AsyncRELAI
|
An instance of the AsyncRELAI client to interact with the RELAI platform. |
required |
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
relai.critico.evaluate.RELAIHallucinationEvaluator(client, transform=None)
¶
Bases: RELAIEvaluator
Evaluator for detecting factual inconsistencies or "hallucinations" in generated text (e.g., summaries) relative to a source document, using a RELAI evaluator.
Required fields:
- `source`: The original text or document from which the summary is derived.
- `summary`: The generated summary to be evaluated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
AsyncRELAI
|
An instance of the AsyncRELAI client to interact with the RELAI platform. |
required |
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
relai.critico.evaluate.RELAIStyleEvaluator(client, transform=None)
¶
Bases: RELAIEvaluator
Evaluator for assessing the stylistic adherence of a generated summary based on provided rubrics, using a RELAI evaluator.
Required fields:
- `style_rubrics`: A dictionary of style rubrics with their associated weights,
which the summary should adhere to.
- `summary`: The generated summary to be evaluated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
relai.critico.evaluate.RELAIFormatEvaluator(client, transform=None)
¶
Bases: RELAIEvaluator
Evaluator for assessing the formatting adherence of a generated summary based on provided rubrics, using a RELAI evaluator.
Required fields:
- `format_rubrics`: A dictionary of format rubrics with their associated weights,
which the summary should adhere to.
- `summary`: The generated summary to be evaluated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
relai.critico.evaluate.RELAIRubricBasedEvaluator(client, transform=None)
¶
Bases: RELAIEvaluator
Evaluator for performing a detailed, rubric-driven assessment of an AI agent's response to a query using an LLM-based evaluator on the RELAI platform.
Required fields:
- `question`: The question or prompt that the AI agent was asked to respond to.
- `answer`: The AI agent's generated response to the question.
- `rubrics`: A dictionary of evaluation criteria with their associated weights,
which the answer should satisfy.
- `std_answer`: The standard or expected answer against which the AI agent's response is evaluated.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
AsyncRELAI
|
An instance of the AsyncRELAI client to interact with the RELAI platform. |
required |
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
relai.critico.evaluate.RELAIAnnotationEvaluator(client, transform=None)
¶
Bases: RELAIEvaluator
Evaluator for assessing agent logs based on past human preference annotations provided through the RELAI platform.
Required fields:
- `all_inputs`: The full set of inputs originally supplied to the agent.
- `previous_outputs`: Prior agent output(s) shown to the human annotator.
- `desired_outputs`: Human-preferred or target outputs provided by the annotator.
- `feedback`: Free-text human feedback or rationale provided by the annotator.
- `liked`: A flag indicating whether the annotator liked the agent's output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
AsyncRELAI
|
An instance of the AsyncRELAI client to interact with the RELAI platform. |
required |
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|
relai.critico.evaluate.RELAICustomEvaluator(evaluator_id, model_name='gpt-5-mini', transform=None)
¶
Bases: Evaluator
Evaluator for assessing agent logs based on the custom evaluator prompt, input and output formats defined on the RELAI platform.
Required fields:
- Any fields specified in the custom evaluator's input format (on the platform).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
evaluator_id
|
str
|
The unique identifier of the custom evaluator defined on the RELAI platform. |
required |
model_name
|
str
|
The name of the model to use for the evaluator. Defaults to "gpt-5-mini". |
'gpt-5-mini'
|
transform
|
Callable
|
An optional callable to transform (pre-process) the |
None
|