rcognita.actors.ActorTabular

class rcognita.actors.ActorTabular(dim_world, predictor=None, optimizer=None, running_objective=None, model=None, action_space=None, critic=None, discount_factor=1, terminal_state=None)

Actor minimizing the sum of the running objective and the optimal (or estimate thereof) objective of the next step. May be suitable for value iteration and policy iteration agents. Specifically, it optimizes the following actor objective:

\(J^a \left( y_k| \{u\}_k \right) = r(y_{k}, u_{k}) + \gamma J^*(y_{k})\)

Notation:

  • \(y\): observation

  • \(u\): action

  • \(\gamma\): discount factor

  • \(r\): running objective function

  • \(J^*\): optimal objective function (or its estimate)

The action and state space are assumed discrete and finite.

__init__(dim_world, predictor=None, optimizer=None, running_objective=None, model=None, action_space=None, critic=None, discount_factor=1, terminal_state=None)

Initializes an actorTabular object.

Parameters
  • dim_world (int) – The dimensions of the world (i.e. the dimensions of the state space).

  • predictor (object) – An object that predicts the next state given an action and the current state.

  • optimizer (object) – An object that optimizes the actor’s objective function.

  • running_objective (function) – A function that returns a scalar representing the running objective for a given state and action.

  • model (object) – An object that computes an action given an observation and some weights.

  • action_space (array) – An array of the possible actions.

  • critic (object) – An object that computes the optimal objective function.

  • discount_factor (float) – The discount factor for the optimal objective function.

  • terminal_state (object) – The terminal state of the world.

Methods

__init__(dim_world[, predictor, optimizer, …])

Initializes an actorTabular object.

accept_or_reject_weights(weights[, …])

Determines whether the given weights should be accepted or rejected based on the specified constraints.

cache_weights([weights])

Cache the current weights of the model of the actor.

create_observation_constraints(…)

Method to create observation (or state) related constraints using a predictor over a prediction_horizon.

objective(action, observation)

Calculates the actor objective for a given action and observation.

optimize_weights([constraint_functions, time])

Method to optimize the current actor weights.

receive_observation(observation)

Update the current observation of the actor.

reset()

Reset the actor to its initial state.

restore_weights()

Restore the previously cached weights of the model of the actor.

set_action(action)

Set the current action of the actor.

update()

Updates the action table using the optimizer.

update_action([observation])

Update the current action of the actor.

update_and_cache_weights([weights])

Update and cache the weights of the model of the actor.

update_weights([weights])

Update the weights of the model of the actor.