sac_learner

`SACLearner(actuator_ids, sensor_ids, agent_objective, *, replay_size=int(1000000.0), fc_dims=(256, 256), activation='torch.nn.ReLU', gamma=0.99, polyak=0.995, lr=0.001, batch_size=100, update_after=1000, update_every=50)`

Bases: ActiveLearner

Learner class for the palaestrAI SAC agent.

Initialize the SAC learner.

Parameters:

Name	Type	Description	Default
`actuator_ids`	`list[str]`	The IDs of actuators the learner should use to interact with the environment.	required
`sensor_ids`	`list[str]`	The IDs of sensors the learner should be able to see from the environment.	required
`agent_objective`	`Objective`	The objective function that takes environment rewards and converts them to an objective for the agent.	required
`replay_size`	`int`	Maximum length of replay buffer.	`int(1000000.0)`
`fc_dims`	`Sequence[int]`	Dimensions of the hidden layers of the agent's actor and critic networks. "fc" stands for "fully connected".	`(256, 256)`
`activation`	`str`	Activation function to use	`'torch.nn.ReLU'`
`gamma`	`float`	Discount factor. (Always between 0 and 1.)	`0.99`
`polyak`	`float`	Interpolation factor in polyak averaging for target networks. Target networks are updated towards main networks according to: \(\(\theta_{\text{targ}} \leftarrow \rho \theta_{ \text{targ}} + (1-\rho) \theta,\)\) where \(\rho\) is polyak. (Always between 0 and 1, usually close to 1.)	`0.995`
`lr`	`float`	Learning rate (used for both policy and value learning).	`0.001`
`batch_size`	`int`	Minibatch size for SGD.	`100`
`update_after`	`int`	Number of env interactions to collect before starting to do gradient descent updates. Ensures replay buffer is full enough for useful updates.	`1000`
`update_every`	`int`	Number of env interactions that should elapse between gradient descent updates. Note: Regardless of how long you wait between updates, the ratio of environment interactions to gradient steps is locked to 1.	`50`

Source code in src/flowcean/palaestrai/sac_learner.py

def __init__(
    self,
    actuator_ids: list[str],
    sensor_ids: list[str],
    agent_objective: Objective,
    *,
    replay_size: int = int(1e6),
    fc_dims: Sequence[int] = (256, 256),
    activation: str = "torch.nn.ReLU",
    gamma: float = 0.99,
    polyak: float = 0.995,
    lr: float = 1e-3,
    batch_size: int = 100,
    update_after: int = 1000,
    update_every: int = 50,
) -> None:
    r"""Initialize the SAC learner.

    Args:
        actuator_ids: The IDs of actuators the learner should use to
            interact with the environment.
        sensor_ids: The IDs of sensors the learner should be able to see
            from the environment.
        agent_objective: The objective function that takes environment
            rewards and converts them to an objective for the agent.
        replay_size: Maximum length of replay buffer.
        fc_dims: Dimensions of the hidden layers of the agent's actor and
            critic networks. "fc" stands for "fully connected".
        activation: Activation function to use
        gamma: Discount factor. (Always between 0 and 1.)
        polyak: Interpolation factor in polyak averaging for target
            networks. Target networks are updated towards main networks
            according to: $$\theta_{\text{targ}} \leftarrow \rho \theta_{
            \text{targ}} + (1-\rho) \theta,$$ where $\rho$ is polyak.
            (Always between 0 and 1, usually close to 1.)
        lr: Learning rate (used for both policy and value learning).
        batch_size: Minibatch size for SGD.
        update_after: Number of env interactions to collect before starting
            to do gradient descent updates. Ensures replay buffer is full
            enough for useful updates.
        update_every: Number of env interactions that should elapse between
            gradient descent updates.
            Note: Regardless of how long you wait between updates, the
            ratio of environment interactions to gradient steps is locked
            to 1.
    """
    self.actuator_ids = actuator_ids
    self.sensor_ids = sensor_ids
    self.agent_objective = agent_objective
    self.objective_values = []
    self.rewards = []
    self.brain_params = {
        "replay_size": replay_size,
        "fc_dims": fc_dims,
        "activation": activation,
        "gamma": gamma,
        "polyak": polyak,
        "lr": lr,
        "batch_size": batch_size,
        "update_after": update_after,
        "update_every": update_every,
    }