protomotions.agents.utils.normalization module

Running mean and standard deviation computation for normalization.

This module provides efficient online computation of mean and variance statistics for observation and reward normalization in reinforcement learning. Uses Welford’s algorithm with distributed training support.

Key Classes:
  • RunningMeanStd: Computes running statistics with optional clamping

  • RewardRunningMeanStd: Specialized for reward normalization with discount factor

Key Features:
  • Online updates (no need to store all data)

  • Distributed training support (aggregates across processes)

  • Optional value clamping for stability

  • State dict support for checkpointing

class protomotions.agents.utils.normalization.RunningMeanStd(fabric, shape=None, epsilon=1e-05, device='cuda:0', clamp_value=None)[source]

Bases: <Mock object at 0x701e6b419bd0>[]

Running mean and standard deviation computation.

Computes and maintains running statistics (mean, variance, count) for data streams. Uses Welford’s online algorithm extended for parallel/distributed computation. Commonly used for normalizing observations and rewards in RL.

Parameters:
  • fabric (MockFabric) – Lightning Fabric instance for distributed aggregation.

  • shape (Tuple[int, ...] | None) – Shape of the data being normalized.

  • epsilon (int) – Small constant for numerical stability.

  • device – PyTorch device for tensors.

  • clamp_value (float | None) – Optional clipping value for normalized outputs.

mean

Running mean (float64 for precision).

var

Running variance (float64 for precision).

count

Number of samples seen.

Example

>>> rms = RunningMeanStd(fabric, shape=(128,), device="cuda")
>>> rms.record_moments(observations)
>>> normalized_obs = rms.normalize(new_observations)

References

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm

__init__(fabric, shape=None, epsilon=1e-05, device='cuda:0', clamp_value=None)[source]

Initialize running statistics tracker with optional lazy initialization.

Parameters:
  • fabric (MockFabric) – Lightning Fabric for distributed training.

  • shape (Tuple[int, ...] | None) – Shape of data to normalize. If None, will be inferred on first forward pass.

  • epsilon (int) – Numerical stability constant.

  • device – PyTorch device.

  • clamp_value (float | None) – Optional value for clamping normalized outputs.

to(device)[source]
maybe_clamp(x)[source]
normalize(arr, un_norm=False)[source]
protomotions.agents.utils.normalization.combine_moments(means, vars, counts)[source]

Combine moments from multiple processes robustly using a pairwise algorithm.

class protomotions.agents.utils.normalization.RewardRunningMeanStd(fabric, shape, gamma, epsilon=1e-05, clamp_value=None, device='cuda:0')[source]

Bases: RunningMeanStd

__init__(fabric, shape, gamma, epsilon=1e-05, clamp_value=None, device='cuda:0')[source]

Initialize running statistics tracker with optional lazy initialization.

Parameters:
  • fabric (MockFabric) – Lightning Fabric for distributed training.

  • shape (Tuple[int, ...]) – Shape of data to normalize. If None, will be inferred on first forward pass.

  • epsilon (float) – Numerical stability constant.

  • device (str) – PyTorch device.

  • clamp_value (float | None) – Optional value for clamping normalized outputs.

record_reward(reward, terminated)[source]
normalize(arr, un_norm=False)[source]