rcognita.actors.ActorSQL
- class rcognita.actors.ActorSQL(dim_output: int = 5, dim_input: int = 2, prediction_horizon: int = 1, action_bounds: Optional[Union[list, numpy.ndarray]] = None, action_init: Optional[list] = None, predictor: Optional[rcognita.predictors.Predictor] = None, optimizer: Optional[rcognita.optimizers.Optimizer] = None, critic: Optional[rcognita.critics.Critic] = None, running_objective=None, model: Optional[rcognita.models.Model] = None, discount_factor=1)
Staked Q-learning (SQL) actor. Optimizes the following actor objective: \(J^a \left( y_k| \{u\}_k^{N_a+1} \right) = \sum_{i=0}^{N_a} \gamma^i Q(y_{i|k}, u_{i|k})\)
Notation:
\(y\): observation
\(u\): action
\(N_a\): prediction horizon
\(gamma\): discount factor
\(r\): running objective function
\(Q\): action-objective function (or its estimate)
\(\{\bullet\}_k^N\): sequence from index \(k\) to index \(k+N-1\)
\(\bullet_{i|k}\): element in a sequence with index \(k+i-1\)
- __init__(dim_output: int = 5, dim_input: int = 2, prediction_horizon: int = 1, action_bounds: Optional[Union[list, numpy.ndarray]] = None, action_init: Optional[list] = None, predictor: Optional[rcognita.predictors.Predictor] = None, optimizer: Optional[rcognita.optimizers.Optimizer] = None, critic: Optional[rcognita.critics.Critic] = None, running_objective=None, model: Optional[rcognita.models.Model] = None, discount_factor=1)
Initialize an actor.
- Parameters
prediction_horizon (int) – Number of time steps to look into the future.
dim_input (int) – Dimension of the observation.
dim_output (int) – Dimension of the action.
action_bounds (list or ndarray, optional) – Bounds on the action.
action_init (list, optional) – Initial action.
predictor (Predictor, optional) – Predictor object for generating predictions.
optimizer (Optimizer, optional) – Optimizer object for optimizing the action.
critic (Critic, optional) – Critic object for evaluating actions.
running_objective (RunningObjective, optional) – Running objective object for recording the running objective.
model (Model, optional) – Model object to be used as reference by the Predictor and the Critic.
discount_factor (float, optional) – discount factor to be used in conjunction with the critic.
Methods
__init__
([dim_output, dim_input, …])Initialize an actor.
accept_or_reject_weights
(weights[, …])Determines whether the given weights should be accepted or rejected based on the specified constraints.
cache_weights
([weights])Cache the current weights of the model of the actor.
create_observation_constraints
(…)Method to create observation (or state) related constraints using a predictor over a prediction_horizon.
objective
(action_sequence, observation)Calculates the actor objective for the given action sequence and observation using the stacked Q-learning (SQL) algorithm.
optimize_weights
([constraint_functions, time])Method to optimize the current actor weights.
receive_observation
(observation)Update the current observation of the actor.
reset
()Reset the actor to its initial state.
restore_weights
()Restore the previously cached weights of the model of the actor.
set_action
(action)Set the current action of the actor.
update_action
([observation])Update the current action of the actor.
update_and_cache_weights
([weights])Update and cache the weights of the model of the actor.
update_weights
([weights])Update the weights of the model of the actor.