rcognita.critics.CriticTabularVI

class rcognita.critics.CriticTabularVI(dim_state_space, running_objective, predictor, model, actor_model, discount_factor=1, N_parallel_processes=5, terminal_state=None)

Critic for tabular agents.

__init__(dim_state_space, running_objective, predictor, model, actor_model, discount_factor=1, N_parallel_processes=5, terminal_state=None)

Initialize a CriticTabularVI object.

Parameters
  • dim_state_space (tuple of int) – The dimensions of the state space.

  • running_objective (callable) – The running objective function.

  • predictor (any) – The predictor object.

  • model (Model) – The model object.

  • actor_model (any) – The actor model object.

  • discount_factor (float, optional) – The discount factor for the temporal difference.

  • N_parallel_processes (int, optional) – The number of parallel processes to use.

  • terminal_state (optional, int or tuple of int) – The terminal state, if applicable.

Returns

None

Methods

__init__(dim_state_space, running_objective, …)

Initialize a CriticTabularVI object.

accept_or_reject_weights(weights[, …])

Determine whether to accept or reject the given weights based on whether they violate the given constraints.

cache_weights([weights])

Stores a copy of the current model weights.

initialize_buffers()

Initialize the action and observation buffers with zeros.

objective(observation, action)

Calculate the value of a state given the action taken and the observation of the current state.

optimize_weights([time])

Compute optimized critic weights, possibly subject to constraints.

reset()

Reset the outcome and current critic loss variables, and re-initialize the buffers.

restore_weights()

Restores the model weights to the cached weights.

update()

Update the value function for all states.

update_and_cache_weights([weights])

Update the model’s weights and cache the new values.

update_buffers(observation, action)

Updates the buffers of the critic with the given observation and action.

update_outcome(observation, action)

Update the outcome variable based on the running objective and the current observation and action.

update_single_cell(observation)

Update the value function for a single state.

update_target(new_target)

update_weights([weights])

Update the weights of the critic model.

Attributes

optimizer_engine

Returns the engine used by the optimizer.