protomotions.agents.amp.config module¶
- class protomotions.agents.amp.config.AMPParametersConfig(conditional_discriminator=False, discriminator_reward_w=1.0, discriminator_weight_decay=0.0001, discriminator_logit_weight_decay=0.01, discriminator_batch_size=4096, discriminator_grad_penalty=5.0, discriminator_optimization_ratio=1, discriminator_replay_keep_prob=0.01, discriminator_replay_size=200000, discriminator_reward_threshold=0.05, discriminator_max_cumulative_bad_transitions=10)[source]¶
Bases:
ConfigBuilderConfiguration for AMP-specific hyperparameters.
- __init__(conditional_discriminator=False, discriminator_reward_w=1.0, discriminator_weight_decay=0.0001, discriminator_logit_weight_decay=0.01, discriminator_batch_size=4096, discriminator_grad_penalty=5.0, discriminator_optimization_ratio=1, discriminator_replay_keep_prob=0.01, discriminator_replay_size=200000, discriminator_reward_threshold=0.05, discriminator_max_cumulative_bad_transitions=10)¶
- class protomotions.agents.amp.config.DiscriminatorConfig(input_models, _target_='protomotions.agents.amp.model.Discriminator', in_keys=<factory>, out_keys=<factory>)[source]¶
Bases:
SequentialModuleConfigConfiguration for AMP Discriminator network.
- __init__(input_models, _target_='protomotions.agents.amp.model.Discriminator', in_keys=<factory>, out_keys=<factory>)¶
- class protomotions.agents.amp.config.AMPModelConfig(_target_='protomotions.agents.amp.model.AMPModel', in_keys=<factory>, out_keys=<factory>, actor=<factory>, critic=<factory>, actor_optimizer=<factory>, critic_optimizer=<factory>, discriminator=<factory>, discriminator_optimizer=<factory>)[source]¶
Bases:
PPOModelConfigConfiguration for AMP Model (Actor-Critic with Discriminator).
- discriminator: DiscriminatorConfig¶
- discriminator_optimizer: OptimizerConfig¶
- __init__(_target_='protomotions.agents.amp.model.AMPModel', in_keys=<factory>, out_keys=<factory>, actor=<factory>, critic=<factory>, actor_optimizer=<factory>, critic_optimizer=<factory>, discriminator=<factory>, discriminator_optimizer=<factory>)¶
- class protomotions.agents.amp.config.AMPAgentConfig(batch_size, training_max_steps, _target_='protomotions.agents.amp.agent.AMP', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0, tau=0.95, e_clip=0.2, clip_critic_loss=True, actor_clip_frac_threshold=0.6, advantage_normalization=<factory>, amp_parameters=<factory>)[source]¶
Bases:
PPOAgentConfigMain configuration class for AMP Agent.
- model: AMPModelConfig¶
- amp_parameters: AMPParametersConfig¶
- __init__(batch_size, training_max_steps, _target_='protomotions.agents.amp.agent.AMP', model=<factory>, num_steps=32, gradient_clip_val=0.0, fail_on_bad_grads=False, check_grad_mag=True, gamma=0.99, bounds_loss_coef=0.0, task_reward_w=1.0, num_mini_epochs=1, training_early_termination=None, save_epoch_checkpoint_every=1000, save_last_checkpoint_every=10, max_episode_length_manager=None, evaluator=<factory>, normalize_rewards=True, normalized_reward_clamp_value=5.0, tau=0.95, e_clip=0.2, clip_critic_loss=True, actor_clip_frac_threshold=0.6, advantage_normalization=<factory>, amp_parameters=<factory>)¶