rcognita.critics.CriticTabularVI

class rcognita.critics.CriticTabularVI(dim_state_space, running_objective, predictor, model, actor_model, discount_factor=1, N_parallel_processes=5, terminal_state=None)

Critic for tabular agents.

__init__(dim_state_space, running_objective, predictor, model, actor_model, discount_factor=1, N_parallel_processes=5, terminal_state=None)

Initialize a CriticTabularVI object.

Parameters

dim_state_space (tuple of int) – The dimensions of the state space.
running_objective (callable) – The running objective function.
predictor (any) – The predictor object.
model (Model) – The model object.
actor_model (any) – The actor model object.
discount_factor (float, optional) – The discount factor for the temporal difference.
N_parallel_processes (int, optional) – The number of parallel processes to use.
terminal_state (optional, int or tuple of int) – The terminal state, if applicable.

Returns

None

Methods

`__init__`(dim_state_space, running_objective, …)	Initialize a CriticTabularVI object.
`accept_or_reject_weights`(weights[, …])	Determine whether to accept or reject the given weights based on whether they violate the given constraints.
`cache_weights`([weights])	Stores a copy of the current model weights.
`initialize_buffers`()	Initialize the action and observation buffers with zeros.
`objective`(observation, action)	Calculate the value of a state given the action taken and the observation of the current state.
`optimize_weights`([time])	Compute optimized critic weights, possibly subject to constraints.
`reset`()	Reset the outcome and current critic loss variables, and re-initialize the buffers.
`restore_weights`()	Restores the model weights to the cached weights.
`update`()	Update the value function for all states.
`update_and_cache_weights`([weights])	Update the model’s weights and cache the new values.
`update_buffers`(observation, action)	Updates the buffers of the critic with the given observation and action.
`update_outcome`(observation, action)	Update the outcome variable based on the running objective and the current observation and action.
`update_single_cell`(observation)	Update the value function for a single state.
`update_target`(new_target)
`update_weights`([weights])	Update the weights of the critic model.

Attributes

optimizer_engine

Returns the engine used by the optimizer.