protomotions.agents.ppo.config module¶
Configuration classes for PPO agent.
This module defines all configuration dataclasses for the Proximal Policy Optimization (PPO) algorithm, including actor-critic architecture parameters, optimization settings, and training hyperparameters.
- Key Classes:
PPOAgentConfig: Main PPO agent configuration
PPOModelConfig: PPO model (actor-critic) configuration
PPOActorConfig: Policy network configuration
AdvantageNormalizationConfig: Advantage normalization settings
- class protomotions.agents.ppo.config.PPOActorConfig(mu_key, in_keys=<factory>, out_keys=<factory>, _target_='protomotions.agents.ppo.model.PPOActor', mu_model=<factory>, num_out=None, actor_logstd=-2.9)[source]¶
Bases:
ConfigBuilderConfiguration for PPO Actor network.
- mu_model: SequentialModuleConfig¶
- __init__(mu_key, in_keys=<factory>, out_keys=<factory>, _target_='protomotions.agents.ppo.model.PPOActor', mu_model=<factory>, num_out=None, actor_logstd=-2.9)¶
- class protomotions.agents.ppo.config.PPOModelConfig(_target_='protomotions.agents.ppo.model.PPOModel', in_keys=<factory>, out_keys=<factory>, actor=<factory>, critic=<factory>, actor_optimizer=<factory>, critic_optimizer=<factory>)[source]¶
Bases:
BaseModelConfigConfiguration for PPO Model (Actor-Critic).
- actor: PPOActorConfig¶
- critic: SequentialModuleConfig¶
- actor_optimizer: OptimizerConfig¶
- critic_optimizer: OptimizerConfig¶
- __init__(_target_='protomotions.agents.ppo.model.PPOModel', in_keys=<factory>, out_keys=<factory>, actor=<factory>, critic=<factory>, actor_optimizer=<factory>, critic_optimizer=<factory>)¶
- class protomotions.agents.ppo.config.AdvantageNormalizationConfig(enabled=True, shift_mean=True, use_ema=True, ema_alpha=0.05, min_std=0.02, clamp_range=4.0)[source]¶
Bases:
ConfigBuilderConfiguration for advantage normalization.
- __init__(enabled=True, shift_mean=True, use_ema=True, ema_alpha=0.05, min_std=0.02, clamp_range=4.0)¶
- class protomotions.agents.ppo.config.PPOAgentConfig(batch_size, training_max_steps, _target_='protomotions.agents.ppo.agent.PPO', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0, tau=0.95, e_clip=0.2, clip_critic_loss=True, actor_clip_frac_threshold=0.6, advantage_normalization=<factory>)[source]¶
Bases:
BaseAgentConfigMain configuration class for PPO Agent.
- model: PPOModelConfig¶
- advantage_normalization: AdvantageNormalizationConfig¶
- __init__(batch_size, training_max_steps, _target_='protomotions.agents.ppo.agent.PPO', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0, tau=0.95, e_clip=0.2, clip_critic_loss=True, actor_clip_frac_threshold=0.6, advantage_normalization=<factory>)¶