protomotions.agents.ppo.model module¶

PPO model implementation with actor-critic architecture.

This module implements the neural network models for Proximal Policy Optimization. The actor outputs a Gaussian policy distribution, and the critic estimates state values.

Key Classes:

PPOActor: Policy network with Gaussian action distribution
PPOModel: Complete actor-critic model for PPO

class protomotions.agents.ppo.model.PPOActor(*args, **kwargs)[source]¶

Bases: TensorDictModuleBase

PPO policy network (actor).

Self-contained policy that computes distribution parameters, samples actions, and computes log probabilities all in a single forward pass.

Parameters:: config (PPOActorConfig) – Actor configuration including network architecture and initial log std.

logstd¶: Log standard deviation parameter (typically fixed during training).

mu¶: Neural network that outputs action means.

in_keys¶: List of input keys from mu model.

out_keys¶: List of output keys (action, mean_action, neglogp).

__init__(config)[source]¶

forward(tensordict)[source]¶

Forward pass: compute mu/std, sample action, compute neglogp.

This is the only method - self-contained and clean.

Parameters:: tensordict (MockTensorDict) – TensorDict containing observations.
Returns:: TensorDict with action, mean_action, and neglogp added.
Return type:: MockTensorDict

class protomotions.agents.ppo.model.PPOModel(*args, **kwargs)[source]¶

Bases: BaseModel

Complete PPO model with actor and critic networks.

Pure forward function that computes all model outputs in TensorDict. The forward pass adds action distribution parameters and value estimates.

Parameters:: config (PPOModelConfig) – Model configuration specifying actor and critic architectures.

_actor¶: Policy network.

_critic¶: Value network.

config: PPOModelConfig¶

__init__(config)[source]¶

forward(tensordict)[source]¶

Forward pass through actor and critic.

This is the main interface for the model. Computes all outputs: - action: Sampled action - mean_action: Deterministic action (mean) - neglogp: Negative log probability of sampled action - value: State value estimate

Parameters:: tensordict (MockTensorDict) – TensorDict containing observations.
Returns:: TensorDict with all model outputs added.
Return type:: MockTensorDict