Benchmark¶

`relai.benchmark.Benchmark(benchmark_id, samples=None)` ¶

Bases: ABC

Abstract base class for defining and managing benchmarks.

This class provides a foundational structure for benchmarks, enabling the download, and iteration of samples. It ensures that all concrete benchmark implementations have a unique identifier and a collection of samples to be used as inputs for AI agents and evaluators.

Attributes:

Name	Type	Description
`benchmark_id`	`str`	A unique identifier for this specific benchmark.
`samples`	`list[RELAISample]`	A list of `RELAISample` objects contained within this benchmark. Defaults to an empty list if not provided.

Parameters:

Name	Type	Description	Default
`benchmark_id`	`str`	The unique identifier for the benchmark.	required
`samples`	`list[RELAISample]`	A list of `RELAISample` objects to include in the benchmark. Defaults to an empty list.	`None`

`iter()` ¶

Enables iteration over the samples within the benchmark as follows:

for sample in benchmark:
    # Process each sample
    pass

Yields:

Name	Type	Description
`RELAISample`	`RELAISample`	Each `RELAISample` object contained in the benchmark.

`len()` ¶

Returns the number of samples currently in the benchmark.

Returns:

Name	Type	Description
`int`	`int`	The total count of `Sample` objects.

`sample(n=1)` ¶

Returns n random samples from the benchmark, with replacement.

If n is greater than the total number of samples, samples may be repeated in the returned list.

Parameters:

Name	Type	Description	Default
`n`	`int`	The number of random samples to retrieve. Must be a positive integer. Defaults to 1.	`1`

Returns:

Type	Description
`list[RELAISample]`	list[RELAISample]: A list containing `n` randomly selected `RELAISample` objects.

Raises:

Type	Description
`ValueError`	If `n` is less than or equal to 0.

`relai.benchmark.RELAIBenchmark(benchmark_id, field_name_mapping=None, field_value_transform=None, agent_input_fields=None, extra_fields=None)` ¶

Bases: Benchmark

A concrete implementation of Benchmark that downloads samples from the RELAI platform.

Attributes:

Name	Type	Description
`benchmark_id`	`str`	The unique identifier (ID) of the RELAI benchmark to be loaded from the platform. You can find the benchmark ID in the metadata of the benchmark.
`samples`	`list[RELAISample]`	A list of `RELAISample` objects contained within this benchmark.

Parameters:

Name	Type	Description	Default
`benchmark_id`	`str`	The unique identifier for the RELAI benchmark. This ID is used to fetch the benchmark data from the RELAI platform.	required
`field_name_mapping`	`dict[str, str]`	A mapping from field names returned by the RELAI API to standardized field names expected by the evaluators. If a field name is not present in this mapping, it is used as-is. Defaults to an empty dictionary.	`None`
`field_value_transform`	`dict[str, Callable]`	A mapping from field names to transformation functions that convert field values from the RELAI API into the desired format. If a field name is not present in this mapping, the identity function is used (i.e., no transformation). Defaults to an empty dictionary.	`None`
`agent_input_fields`	`list[str]`	A list of field names to extract from each sample for the `agent_inputs` dictionary. These fields are provided to the AI agent. Defaults to an empty list.	`None`
`extra_fields`	`list[str]`	A list of field names to extract from each sample for the `extras` dictionary. These fields are also provided to the evaluators. Defaults to an empty list.	`None`

`fetch_samples()` ¶

Downloads samples from the RELAI platform and populates the samples attribute.

This method fetches the benchmark data using the RELAI client and processes each sample to create Sample objects. The samples attribute is then updated with the newly fetched samples.

`relai.benchmark.RELAIQuestionAnsweringBenchmark(benchmark_id)` ¶

Bases: RELAIBenchmark

A concrete implementation of RELAIBenchmark for question-answering tasks. All samples in this benchmark have the following fields:

agent_inputs:
- question: The question to be answered by the AI agent.
extras:
- rubrics: A dictionary of rubrics for evaluating the answer.
- std_answer: The standard answer to the question. }

Parameters:

Name	Type	Description	Default
`benchmark_id`	`str`	The unique identifier for the RELAI question-answering benchmark. This ID is used to fetch the benchmark data from the RELAI platform.	required

`relai.benchmark.RELAISummarizationBenchmark(benchmark_id)` ¶

Bases: RELAIBenchmark

A concrete implementation of RELAIBenchmark for summarization tasks. All samples in this benchmark have the following fields:

agent_inputs:
- source: The text to be summarized.
extras:
- key_facts: A list of key facts extracted from the source.
- style_rubrics: A dictionary of rubrics for evaluating the style of the summary.
- format_rubrics: A dictionary of rubrics for evaluating the format of the summary.

Parameters:

Name	Type	Description	Default
`benchmark_id`	`str`	The unique identifier for the RELAI summarization benchmark. This ID is used to fetch the benchmark data from the RELAI platform.	required

`relai.benchmark.RELAIAnnotationBenchmark(benchmark_id)` ¶

Bases: RELAIBenchmark

A concrete implementation of RELAIBenchmark for benchmarks created from user annotations. All samples in this benchmark have the following fields:

agent_inputs:
- The input(s) provided to the agent being evaluated.
extras:
- previous_outputs: The previous outputs produced by the agent.
- desired_outputs: The desired outputs as specified by the user.
- feedback: The user feedback provided for the previous outputs.
- liked: A boolean indicating whether the user liked the previous outputs.

Parameters:

Name	Type	Description	Default
`client`	`AsyncRELAI`	An instance of the AsyncRELAI client to interact with the RELAI platform.	required
`benchmark_id`	`str`	The unique identifier for the RELAI summarization benchmark. This ID is used to fetch the benchmark data from the RELAI platform.	required

`relai.benchmark.CSVBenchmark(csv_file, agent_input_columns=None, extra_columns=None, benchmark_id=None)` ¶

Bases: Benchmark

A concrete implementation of Benchmark that loads samples from a CSV file.

Attributes:

Name	Type	Description
`benchmark_id`	`str`	The unique identifier (ID) of the benchmark to be loaded from the CSV file. Defaults to the CSV file name.
`samples`	`list[Sample]`	A list of `Sample` objects contained within this benchmark.

Parameters:

Name	Type	Description	Default
`csv_file`	`str`	The path to the CSV file containing benchmark samples.	required
`agent_input_columns`	`list[str]`	A list of column names in the CSV file that should be used as inputs for the AI agent. Defaults to an empty list.	`None`
`extra_columns`	`list[str]`	A list of column names in the CSV file that could be used as inputs for evaluators. Defaults to an empty list.	`None`
`benchmark_id`	`str`	A unique identifier for the benchmark. If not provided, it defaults to the name of the CSV file.	`None`

Benchmark¶

relai.benchmark.Benchmark(benchmark_id, samples=None) ¶

__iter__() ¶

__len__() ¶

sample(n=1) ¶

relai.benchmark.RELAIBenchmark(benchmark_id, field_name_mapping=None, field_value_transform=None, agent_input_fields=None, extra_fields=None) ¶

fetch_samples() ¶

relai.benchmark.RELAIQuestionAnsweringBenchmark(benchmark_id) ¶

relai.benchmark.RELAISummarizationBenchmark(benchmark_id) ¶

relai.benchmark.RELAIAnnotationBenchmark(benchmark_id) ¶

relai.benchmark.CSVBenchmark(csv_file, agent_input_columns=None, extra_columns=None, benchmark_id=None) ¶

`relai.benchmark.Benchmark(benchmark_id, samples=None)` ¶

`iter()` ¶

`len()` ¶

`sample(n=1)` ¶

`relai.benchmark.RELAIBenchmark(benchmark_id, field_name_mapping=None, field_value_transform=None, agent_input_fields=None, extra_fields=None)` ¶

`fetch_samples()` ¶

`relai.benchmark.RELAIQuestionAnsweringBenchmark(benchmark_id)` ¶

`relai.benchmark.RELAISummarizationBenchmark(benchmark_id)` ¶

`relai.benchmark.RELAIAnnotationBenchmark(benchmark_id)` ¶

`relai.benchmark.CSVBenchmark(csv_file, agent_input_columns=None, extra_columns=None, benchmark_id=None)` ¶