rcognita.actors.ActorProbabilisticEpisodic

class rcognita.actors.ActorProbabilisticEpisodic(dim_output: int = 5, dim_input: int = 2, action_bounds=None, action_init=None, model=None, **kwargs)
__init__(dim_output: int = 5, dim_input: int = 2, action_bounds=None, action_init=None, model=None, **kwargs)

Initialize an actor that samples actions from a probabilistic model. The actor also stores gradients for the model weights for each action taken.

Parameters
  • action_bounds (list or ndarray, optional) – Bounds on the action.

  • action_init (list, optional) – Initial action.

  • model (Model, optional) – Model object to be used as reference by the Predictor and the Critic.

Methods

__init__([dim_output, dim_input, …])

Initialize an actor that samples actions from a probabilistic model.

accept_or_reject_weights(weights[, …])

Determines whether the given weights should be accepted or rejected based on the specified constraints.

cache_weights([weights])

Cache the current weights of the model of the actor.

create_observation_constraints(…)

Method to create observation (or state) related constraints using a predictor over a prediction_horizon.

get_action()

Get the current action.

optimize_weights()

Method to optimize the current actor weights.

receive_observation(observation)

Update the current observation of the actor.

reset()

Reset the actor’s stored gradients and call the base Actor class’s reset method.

restore_weights()

Restore the previously cached weights of the model of the actor.

set_action(action)

Set the current action of the actor.

store_gradient(gradient)

Store the gradient of the model’s weights.

update_action(observation)

Sample an action from the probabilistic model, clip it to the action bounds, and store its gradient.

update_and_cache_weights()

Update and cache the weights of the model of the actor.

update_weights([weights])

Update the weights of the model of the actor.

update_weights_by_gradient(gradient, …)

Update the model weights by subtracting the gradient multiplied by the learning rate and a constant factor.