protomotions.agents.amp.config module

class protomotions.agents.amp.config.AMPParametersConfig(conditional_discriminator=False, discriminator_reward_w=1.0, discriminator_weight_decay=0.0001, discriminator_logit_weight_decay=0.01, discriminator_batch_size=4096, discriminator_grad_penalty=5.0, discriminator_optimization_ratio=1, discriminator_replay_keep_prob=0.01, discriminator_replay_size=200000, discriminator_reward_threshold=0.05, discriminator_max_cumulative_bad_transitions=10)[source]

Bases: ConfigBuilder

Configuration for AMP-specific hyperparameters.

conditional_discriminator: bool = False
discriminator_reward_w: float = 1.0
discriminator_weight_decay: float = 0.0001
discriminator_logit_weight_decay: float = 0.01
discriminator_batch_size: int = 4096
discriminator_grad_penalty: float = 5.0
discriminator_optimization_ratio: int = 1
discriminator_replay_keep_prob: float = 0.01
discriminator_replay_size: int = 200000
discriminator_reward_threshold: float = 0.05
discriminator_max_cumulative_bad_transitions: int = 10
__init__(conditional_discriminator=False, discriminator_reward_w=1.0, discriminator_weight_decay=0.0001, discriminator_logit_weight_decay=0.01, discriminator_batch_size=4096, discriminator_grad_penalty=5.0, discriminator_optimization_ratio=1, discriminator_replay_keep_prob=0.01, discriminator_replay_size=200000, discriminator_reward_threshold=0.05, discriminator_max_cumulative_bad_transitions=10)
class protomotions.agents.amp.config.DiscriminatorConfig(input_models, _target_='protomotions.agents.amp.model.Discriminator', in_keys=<factory>, out_keys=<factory>)[source]

Bases: SequentialModuleConfig

Configuration for AMP Discriminator network.

out_keys: List[str]
__init__(input_models, _target_='protomotions.agents.amp.model.Discriminator', in_keys=<factory>, out_keys=<factory>)
class protomotions.agents.amp.config.AMPModelConfig(_target_='protomotions.agents.amp.model.AMPModel', in_keys=<factory>, out_keys=<factory>, actor=<factory>, critic=<factory>, actor_optimizer=<factory>, critic_optimizer=<factory>, discriminator=<factory>, discriminator_optimizer=<factory>)[source]

Bases: PPOModelConfig

Configuration for AMP Model (Actor-Critic with Discriminator).

discriminator: DiscriminatorConfig
discriminator_optimizer: OptimizerConfig
__init__(_target_='protomotions.agents.amp.model.AMPModel', in_keys=<factory>, out_keys=<factory>, actor=<factory>, critic=<factory>, actor_optimizer=<factory>, critic_optimizer=<factory>, discriminator=<factory>, discriminator_optimizer=<factory>)
class protomotions.agents.amp.config.AMPAgentConfig(batch_size, training_max_steps, _target_='protomotions.agents.amp.agent.AMP', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0, tau=0.95, e_clip=0.2, clip_critic_loss=True, actor_clip_frac_threshold=0.6, advantage_normalization=<factory>, amp_parameters=<factory>)[source]

Bases: PPOAgentConfig

Main configuration class for AMP Agent.

model: AMPModelConfig
amp_parameters: AMPParametersConfig
__init__(batch_size, training_max_steps, _target_='protomotions.agents.amp.agent.AMP', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0, tau=0.95, e_clip=0.2, clip_critic_loss=True, actor_clip_frac_threshold=0.6, advantage_normalization=<factory>, amp_parameters=<factory>)