protomotions.agents.evaluators.base_evaluator module

Base evaluator for agent evaluation and metrics computation.

This module provides the base evaluation infrastructure for computing performance metrics during training and evaluation. Evaluators run periodic assessments of agent performance and compute task-specific metrics.

Key Classes:
  • BaseEvaluator: Base class for all evaluators

  • SmoothnessMetricPlugin: Plugin for computing motion smoothness metrics

Key Features:
  • Periodic evaluation during training

  • Motion quality metrics computation

  • Episode statistics aggregation

  • Smoothness and jerk analysis

  • Distributed evaluation support

class protomotions.agents.evaluators.base_evaluator.SmoothnessMetricPlugin(evaluator, window_sec=0.4, high_jerk_threshold=6500.0)[source]

Bases: object

Plugin for computing smoothness metrics from motion data.

__init__(evaluator, window_sec=0.4, high_jerk_threshold=6500.0)[source]

Initialize the smoothness metric plugin.

Parameters:
  • evaluator – The parent evaluator instance

  • window_sec (float) – Window size in seconds for smoothness computation

  • high_jerk_threshold (float) – Threshold for classifying high jerk frames

compute(metrics)[source]

Compute smoothness metrics from collected motion data.

Parameters:

metrics (Dict[str, MotionMetrics]) – Dictionary of MotionMetrics

Returns:

Dictionary of smoothness metrics with “eval/” prefix

Return type:

Dict[str, float]

class protomotions.agents.evaluators.base_evaluator.BaseEvaluator(agent, fabric, config)[source]

Bases: object

Base class for agent evaluation and metrics computation.

Runs periodic evaluations during training to assess agent performance. Collects episode statistics, computes task-specific metrics, and provides feedback for checkpoint selection (best model saving).

Parameters:
  • agent (Any) – The agent being evaluated.

  • fabric (MockFabric) – Lightning Fabric instance for distributed evaluation.

  • config (EvaluatorConfig) – Evaluator configuration specifying eval frequency and length.

Example

>>> evaluator = BaseEvaluator(agent, fabric, config)
>>> metrics, score = evaluator.evaluate()
__init__(agent, fabric, config)[source]

Initialize the evaluator.

Parameters:
  • agent (Any) – The agent to evaluate

  • fabric (MockFabric) – Lightning Fabric instance for distributed training

property device: <Mock object at 0x7343244db110>[]

Device for computations (from fabric).

property env: BaseEnv

Environment instance (from agent).

property root_dir

Root directory for saving outputs (from agent).

initialize_eval()[source]

Initialize metrics dictionary with required keys. Prepare the evaluation context.

Returns:

Tuple containing metrics dict and evaluation context dict

Return type:

Tuple[Dict, Dict]

run_evaluation(metrics)[source]

Run the evaluation process and collect metrics.

Parameters:

metrics (Dict) – Dictionary to collect evaluation metrics

process_eval_results(metrics, eval_context)[source]

Process collected metrics and prepare for logging.

Parameters:
  • metrics (Dict) – Dictionary of collected metrics

  • eval_context (Dict) – Dictionary containing evaluation context

Returns:

  • Dict of processed metrics for logging

  • Optional score value for determining best model

Return type:

Tuple containing

cleanup_after_evaluation()[source]

Clean up after evaluation (reset env state, etc.)

simple_test_policy(collect_metrics=False)[source]

Simple evaluation loop for testing the policy.

Parameters:

collect_metrics (bool) – whether to collect metrics during evaluation