protomotions.agents.ppo.config module

Configuration classes for PPO agent.

This module defines all configuration dataclasses for the Proximal Policy Optimization (PPO) algorithm, including actor-critic architecture parameters, optimization settings, and training hyperparameters.

Key Classes:
  • PPOAgentConfig: Main PPO agent configuration

  • PPOModelConfig: PPO model (actor-critic) configuration

  • PPOActorConfig: Policy network configuration

  • AdvantageNormalizationConfig: Advantage normalization settings

class protomotions.agents.ppo.config.PPOActorConfig(mu_key, in_keys=<factory>, out_keys=<factory>, _target_='protomotions.agents.ppo.model.PPOActor', mu_model=<factory>, num_out=None, actor_logstd=-2.9)[source]

Bases: ConfigBuilder

Configuration for PPO Actor network.

mu_key: str
in_keys: List[str]
out_keys: List[str]
mu_model: SequentialModuleConfig
num_out: int = None
actor_logstd: float = -2.9
__init__(mu_key, in_keys=<factory>, out_keys=<factory>, _target_='protomotions.agents.ppo.model.PPOActor', mu_model=<factory>, num_out=None, actor_logstd=-2.9)
class protomotions.agents.ppo.config.PPOModelConfig(_target_='protomotions.agents.ppo.model.PPOModel', in_keys=<factory>, out_keys=<factory>, actor=<factory>, critic=<factory>, actor_optimizer=<factory>, critic_optimizer=<factory>)[source]

Bases: BaseModelConfig

Configuration for PPO Model (Actor-Critic).

out_keys: List[str]
actor: PPOActorConfig
critic: SequentialModuleConfig
actor_optimizer: OptimizerConfig
critic_optimizer: OptimizerConfig
__init__(_target_='protomotions.agents.ppo.model.PPOModel', in_keys=<factory>, out_keys=<factory>, actor=<factory>, critic=<factory>, actor_optimizer=<factory>, critic_optimizer=<factory>)
class protomotions.agents.ppo.config.AdvantageNormalizationConfig(enabled=True, shift_mean=True, use_ema=True, ema_alpha=0.05, min_std=0.02, clamp_range=4.0)[source]

Bases: ConfigBuilder

Configuration for advantage normalization.

enabled: bool = True
shift_mean: bool = True
use_ema: bool = True
ema_alpha: float = 0.05
min_std: float = 0.02
clamp_range: float = 4.0
__init__(enabled=True, shift_mean=True, use_ema=True, ema_alpha=0.05, min_std=0.02, clamp_range=4.0)
class protomotions.agents.ppo.config.PPOAgentConfig(batch_size, training_max_steps, _target_='protomotions.agents.ppo.agent.PPO', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0, tau=0.95, e_clip=0.2, clip_critic_loss=True, actor_clip_frac_threshold=0.6, advantage_normalization=<factory>)[source]

Bases: BaseAgentConfig

Main configuration class for PPO Agent.

model: PPOModelConfig
tau: float = 0.95
e_clip: float = 0.2
clip_critic_loss: bool = True
actor_clip_frac_threshold: float | None = 0.6
advantage_normalization: AdvantageNormalizationConfig
__init__(batch_size, training_max_steps, _target_='protomotions.agents.ppo.agent.PPO', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0, tau=0.95, e_clip=0.2, clip_critic_loss=True, actor_clip_frac_threshold=0.6, advantage_normalization=<factory>)