rcognita.actors.ActorRQL

class rcognita.actors.ActorRQL(dim_output: int = 5, dim_input: int = 2, prediction_horizon: int = 1, action_bounds: Optional[Union[list, numpy.ndarray]] = None, action_init: Optional[list] = None, predictor: Optional[rcognita.predictors.Predictor] = None, optimizer: Optional[rcognita.optimizers.Optimizer] = None, critic: Optional[rcognita.critics.Critic] = None, running_objective=None, model: Optional[rcognita.models.Model] = None, discount_factor=1)

Rollout Q-learning (RQL) actor. Optimizes the following actor objective:

\(J^a \left( y_k| \{u\}_k^{N_a+1} \right) = \sum_{i=0}^{N_a-1} \gamma^i r(y_{i|k}, u_{i|k}) + \gamma^{N_a} Q(y_{N_a|k}, u_{N_a|k})\)

Notation:

\(y\): observation
\(u\): action
\(N_a\): prediction horizon
\(gamma\): discount factor
\(r\): running objective function
\(Q\): action-objective function (or its estimate)
\(\{\bullet\}_k^N\): sequence from index \(k\) to index \(k+N-1\)
\(\bullet_{i|k}\): element in a sequence with index \(k+i-1\)

__init__(dim_output: int = 5, dim_input: int = 2, prediction_horizon: int = 1, action_bounds: Optional[Union[list, numpy.ndarray]] = None, action_init: Optional[list] = None, predictor: Optional[rcognita.predictors.Predictor] = None, optimizer: Optional[rcognita.optimizers.Optimizer] = None, critic: Optional[rcognita.critics.Critic] = None, running_objective=None, model: Optional[rcognita.models.Model] = None, discount_factor=1)

Initialize an actor.

Parameters

prediction_horizon (int) – Number of time steps to look into the future.
dim_input (int) – Dimension of the observation.
dim_output (int) – Dimension of the action.
action_bounds (list or ndarray, optional) – Bounds on the action.
action_init (list, optional) – Initial action.
predictor (Predictor, optional) – Predictor object for generating predictions.
optimizer (Optimizer, optional) – Optimizer object for optimizing the action.
critic (Critic, optional) – Critic object for evaluating actions.
running_objective (RunningObjective, optional) – Running objective object for recording the running objective.
model (Model, optional) – Model object to be used as reference by the Predictor and the Critic.
discount_factor (float, optional) – discount factor to be used in conjunction with the critic.

Methods

`__init__`([dim_output, dim_input, …])	Initialize an actor.
`accept_or_reject_weights`(weights[, …])	Determines whether the given weights should be accepted or rejected based on the specified constraints.
`cache_weights`([weights])	Cache the current weights of the model of the actor.
`create_observation_constraints`(…)	Method to create observation (or state) related constraints using a predictor over a prediction_horizon.
`objective`(action_sequence, observation)	Calculates the actor objective for the given action sequence and observation using Rollout Q-learning (RQL).
`optimize_weights`([constraint_functions, time])	Method to optimize the current actor weights.
`receive_observation`(observation)	Update the current observation of the actor.
`reset`()	Reset the actor to its initial state.
`restore_weights`()	Restore the previously cached weights of the model of the actor.
`set_action`(action)	Set the current action of the actor.
`update_action`([observation])	Update the current action of the actor.
`update_and_cache_weights`([weights])	Update and cache the weights of the model of the actor.
`update_weights`([weights])	Update the weights of the model of the actor.