rcognita.actors.ActorRPO

class rcognita.actors.ActorRPO(dim_output: int = 5, dim_input: int = 2, prediction_horizon: int = 1, action_bounds: Optional[Union[list, numpy.ndarray]] = None, action_init: Optional[list] = None, predictor: Optional[rcognita.predictors.Predictor] = None, optimizer: Optional[rcognita.optimizers.Optimizer] = None, critic: Optional[rcognita.critics.Critic] = None, running_objective=None, model: Optional[rcognita.models.Model] = None, discount_factor=1)

Running (objective) Plus Optimal (objective) actor. Actor minimizing the sum of the running objective and the optimal (or estimate thereof) objective of the next step. May be suitable for value iteration and policy iteration agents. Specifically, it optimizes the following actor objective:

\(J^a \left( y_k| \{u\}_k \right) = r(y_{k}, u_{k}) + \gamma J^*(y_{k})\)

Notation:

\(y\): observation
\(u\): action
\(\gamma\): discount factor
\(r\): running objective function
\(J^*\): optimal objective function (or its estimate)

__init__(dim_output: int = 5, dim_input: int = 2, prediction_horizon: int = 1, action_bounds: Optional[Union[list, numpy.ndarray]] = None, action_init: Optional[list] = None, predictor: Optional[rcognita.predictors.Predictor] = None, optimizer: Optional[rcognita.optimizers.Optimizer] = None, critic: Optional[rcognita.critics.Critic] = None, running_objective=None, model: Optional[rcognita.models.Model] = None, discount_factor=1)

Initialize an actor.

Parameters

prediction_horizon (int) – Number of time steps to look into the future.
dim_input (int) – Dimension of the observation.
dim_output (int) – Dimension of the action.
action_bounds (list or ndarray, optional) – Bounds on the action.
action_init (list, optional) – Initial action.
predictor (Predictor, optional) – Predictor object for generating predictions.
optimizer (Optimizer, optional) – Optimizer object for optimizing the action.
critic (Critic, optional) – Critic object for evaluating actions.
running_objective (RunningObjective, optional) – Running objective object for recording the running objective.
model (Model, optional) – Model object to be used as reference by the Predictor and the Critic.
discount_factor (float, optional) – discount factor to be used in conjunction with the critic.

Methods

`__init__`([dim_output, dim_input, …])	Initialize an actor.
`accept_or_reject_weights`(weights[, …])	Determines whether the given weights should be accepted or rejected based on the specified constraints.
`cache_weights`([weights])	Cache the current weights of the model of the actor.
`create_observation_constraints`(…)	Method to create observation (or state) related constraints using a predictor over a prediction_horizon.
`objective`(action_sequence, observation)	Calculates the actor objective for the given action sequence and observation using Running Plus Optimal (RPO).
`optimize_weights`([constraint_functions, time])	Method to optimize the current actor weights.
`receive_observation`(observation)	Update the current observation of the actor.
`reset`()	Reset the actor to its initial state.
`restore_weights`()	Restore the previously cached weights of the model of the actor.
`set_action`(action)	Set the current action of the actor.
`update_action`([observation])	Update the current action of the actor.
`update_and_cache_weights`([weights])	Update and cache the weights of the model of the actor.
`update_weights`([weights])	Update the weights of the model of the actor.