rcognita.actors.ActorSQL

class rcognita.actors.ActorSQL(dim_output: int = 5, dim_input: int = 2, prediction_horizon: int = 1, action_bounds: Optional[Union[list, numpy.ndarray]] = None, action_init: Optional[list] = None, predictor: Optional[rcognita.predictors.Predictor] = None, optimizer: Optional[rcognita.optimizers.Optimizer] = None, critic: Optional[rcognita.critics.Critic] = None, running_objective=None, model: Optional[rcognita.models.Model] = None, discount_factor=1)

Staked Q-learning (SQL) actor. Optimizes the following actor objective: \(J^a \left( y_k| \{u\}_k^{N_a+1} \right) = \sum_{i=0}^{N_a} \gamma^i Q(y_{i|k}, u_{i|k})\)

Notation:

  • \(y\): observation

  • \(u\): action

  • \(N_a\): prediction horizon

  • \(gamma\): discount factor

  • \(r\): running objective function

  • \(Q\): action-objective function (or its estimate)

  • \(\{\bullet\}_k^N\): sequence from index \(k\) to index \(k+N-1\)

  • \(\bullet_{i|k}\): element in a sequence with index \(k+i-1\)

__init__(dim_output: int = 5, dim_input: int = 2, prediction_horizon: int = 1, action_bounds: Optional[Union[list, numpy.ndarray]] = None, action_init: Optional[list] = None, predictor: Optional[rcognita.predictors.Predictor] = None, optimizer: Optional[rcognita.optimizers.Optimizer] = None, critic: Optional[rcognita.critics.Critic] = None, running_objective=None, model: Optional[rcognita.models.Model] = None, discount_factor=1)

Initialize an actor.

Parameters
  • prediction_horizon (int) – Number of time steps to look into the future.

  • dim_input (int) – Dimension of the observation.

  • dim_output (int) – Dimension of the action.

  • action_bounds (list or ndarray, optional) – Bounds on the action.

  • action_init (list, optional) – Initial action.

  • predictor (Predictor, optional) – Predictor object for generating predictions.

  • optimizer (Optimizer, optional) – Optimizer object for optimizing the action.

  • critic (Critic, optional) – Critic object for evaluating actions.

  • running_objective (RunningObjective, optional) – Running objective object for recording the running objective.

  • model (Model, optional) – Model object to be used as reference by the Predictor and the Critic.

  • discount_factor (float, optional) – discount factor to be used in conjunction with the critic.

Methods

__init__([dim_output, dim_input, …])

Initialize an actor.

accept_or_reject_weights(weights[, …])

Determines whether the given weights should be accepted or rejected based on the specified constraints.

cache_weights([weights])

Cache the current weights of the model of the actor.

create_observation_constraints(…)

Method to create observation (or state) related constraints using a predictor over a prediction_horizon.

objective(action_sequence, observation)

Calculates the actor objective for the given action sequence and observation using the stacked Q-learning (SQL) algorithm.

optimize_weights([constraint_functions, time])

Method to optimize the current actor weights.

receive_observation(observation)

Update the current observation of the actor.

reset()

Reset the actor to its initial state.

restore_weights()

Restore the previously cached weights of the model of the actor.

set_action(action)

Set the current action of the actor.

update_action([observation])

Update the current action of the actor.

update_and_cache_weights([weights])

Update and cache the weights of the model of the actor.

update_weights([weights])

Update the weights of the model of the actor.