protomotions.agents.ppo.model module¶
PPO model implementation with actor-critic architecture.
This module implements the neural network models for Proximal Policy Optimization. The actor outputs a Gaussian policy distribution, and the critic estimates state values.
- Key Classes:
PPOActor: Policy network with Gaussian action distribution
PPOModel: Complete actor-critic model for PPO
- class protomotions.agents.ppo.model.PPOActor(*args, **kwargs)[source]¶
Bases:
TensorDictModuleBasePPO policy network (actor).
Self-contained policy that computes distribution parameters, samples actions, and computes log probabilities all in a single forward pass.
- Parameters:
config (PPOActorConfig) – Actor configuration including network architecture and initial log std.
- logstd¶
Log standard deviation parameter (typically fixed during training).
- mu¶
Neural network that outputs action means.
- in_keys¶
List of input keys from mu model.
- out_keys¶
List of output keys (action, mean_action, neglogp).
- forward(tensordict)[source]¶
Forward pass: compute mu/std, sample action, compute neglogp.
This is the only method - self-contained and clean.
- Parameters:
tensordict (MockTensorDict) – TensorDict containing observations.
- Returns:
TensorDict with action, mean_action, and neglogp added.
- Return type:
MockTensorDict
- class protomotions.agents.ppo.model.PPOModel(*args, **kwargs)[source]¶
Bases:
BaseModelComplete PPO model with actor and critic networks.
Pure forward function that computes all model outputs in TensorDict. The forward pass adds action distribution parameters and value estimates.
- Parameters:
config (PPOModelConfig) – Model configuration specifying actor and critic architectures.
- _actor¶
Policy network.
- _critic¶
Value network.
- config: PPOModelConfig¶
- forward(tensordict)[source]¶
Forward pass through actor and critic.
This is the main interface for the model. Computes all outputs: - action: Sampled action - mean_action: Deterministic action (mean) - neglogp: Negative log probability of sampled action - value: State value estimate
- Parameters:
tensordict (MockTensorDict) – TensorDict containing observations.
- Returns:
TensorDict with all model outputs added.
- Return type:
MockTensorDict