protomotions.agents.common.transformer module

Transformer architecture for sequential modeling.

This module implements transformer-based networks for processing temporal information in reinforcement learning. Used primarily in motion tracking and MaskedMimic agents for handling sequential observations.

Key Classes:
  • Transformer: Main transformer model with positional encoding

  • PositionalEncoding: Sinusoidal positional encodings for sequence position

Key Features:
  • Multi-head self-attention for temporal dependencies

  • Multiple input heads with different encoders

  • Positional encoding for sequence awareness

  • Flexible output heads (single or multi-headed)

class protomotions.agents.common.transformer.Transformer(*args, **kwargs)[source]

Bases: TensorDictModuleBase

Transformer network for sequential observation processing.

Processes multi-modal sequential inputs through separate encoders, combines them into a sequence of tokens, and applies transformer layers for temporal modeling. Used in motion tracking agents to process future reference poses.

Parameters:

config (TransformerConfig) – Transformer configuration specifying architecture parameters.

input_models

Dictionary of input encoders for different observation types.

sequence_pos_encoder

Positional encoding layer.

seqTransEncoder

Stack of transformer encoder layers.

in_keys

List of input keys collected from all input models.

out_keys

List containing output key.

Example

>>> config = TransformerConfig()
>>> model = Transformer(config)
>>> output_td = model(tensordict)
__init__(config)[source]
forward(tensordict)[source]

Forward pass through transformer.

Parameters:

tensordict (MockTensorDict) – TensorDict containing all input observations.

Returns:

TensorDict with transformer output added at self.out_keys[0].

Return type:

MockTensorDict