protomotions.agents.base_agent.config module

Configuration classes for base agent.

This module defines the configuration dataclasses used by the base agent and all derived agents. These configurations specify training parameters, optimization settings, and evaluation parameters.

Key Classes:
  • BaseAgentConfig: Main agent configuration

  • BaseModelConfig: Model architecture configuration

  • OptimizerConfig: Optimizer parameters

  • MaxEpisodeLengthManagerConfig: Episode length curriculum

class protomotions.agents.base_agent.config.MaxEpisodeLengthManagerConfig(start_length=5, end_length=300, transition_epochs=100000)[source]

Bases: ConfigBuilder

Configuration for managing max episode length during training.

start_length: int = 5
end_length: int = 300
transition_epochs: int = 100000
current_max_episode_length(current_epoch)[source]

Returns the current max episode length based on linear interpolation.

Parameters:

current_step – Current step in the episode

Returns:

Interpolated max episode length

Return type:

int

__init__(start_length=5, end_length=300, transition_epochs=100000)
class protomotions.agents.base_agent.config.OptimizerConfig(_target_='torch.optim.Adam', lr=0.0001, weight_decay=0.0, eps=1e-08, betas=<factory>)[source]

Bases: ConfigBuilder

Configuration for optimizers.

lr: float = 0.0001
weight_decay: float = 0.0
eps: float = 1e-08
betas: tuple
__init__(_target_='torch.optim.Adam', lr=0.0001, weight_decay=0.0, eps=1e-08, betas=<factory>)
class protomotions.agents.base_agent.config.BaseModelConfig(_target_='protomotions.agents.base_agent.model.BaseModel', in_keys=<factory>, out_keys=<factory>)[source]

Bases: ConfigBuilder

Configuration for PPO Model (Actor-Critic).

in_keys: List[str]
out_keys: List[str]
__init__(_target_='protomotions.agents.base_agent.model.BaseModel', in_keys=<factory>, out_keys=<factory>)
class protomotions.agents.base_agent.config.BaseAgentConfig(batch_size, training_max_steps, _target_='protomotions.agents.base_agent.agent.BaseAgent', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0)[source]

Bases: ConfigBuilder

Main configuration class for PPO Agent.

batch_size: int
training_max_steps: int
model: BaseModelConfig
num_steps: int = 32
gradient_clip_val: float = 0.0
fail_on_bad_grads: bool = False
check_grad_mag: bool = True
gamma: float = 0.99
bounds_loss_coef: float = 0.0
task_reward_w: float = 1.0
num_mini_epochs: int = 1
training_early_termination: int | None = None
save_epoch_checkpoint_every: int | None = 1000
save_last_checkpoint_every: int = 10
max_episode_length_manager: MaxEpisodeLengthManagerConfig | None = None
evaluator: EvaluatorConfig
normalize_rewards: bool = True
normalized_reward_clamp_value: float = 5.0
__init__(batch_size, training_max_steps, _target_='protomotions.agents.base_agent.agent.BaseAgent', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0)