rcognita.critics.Critic

class rcognita.critics.Critic(system_dim_input: int, system_dim_output: int, data_buffer_size: int, optimizer: Optional[rcognita.optimizers.Optimizer] = None, model: Optional[rcognita.models.Model] = None, running_objective: Optional[rcognita.objectives.Objective] = None, discount_factor: float = 1.0, observation_target: Optional[numpy.ndarray] = None, sampling_time: float = 0.01, critic_regularization_param: float = 0.0)

Critic base class.

A critic is an object that estimates or provides the value of a given action or state in a reinforcement learning problem.

The critic estimates the value of an action by learning from past experience, typically through the optimization of a loss function.

__init__(system_dim_input: int, system_dim_output: int, data_buffer_size: int, optimizer: Optional[rcognita.optimizers.Optimizer] = None, model: Optional[rcognita.models.Model] = None, running_objective: Optional[rcognita.objectives.Objective] = None, discount_factor: float = 1.0, observation_target: Optional[numpy.ndarray] = None, sampling_time: float = 0.01, critic_regularization_param: float = 0.0)

Initialize a critic object.

Parameters

system_dim_input (int) – Dimension of the input data
system_dim_output (int) – Dimension of the output data
data_buffer_size (int) – Maximum size of the data buffer
optimizer (Optional[Optimizer]) – Optimizer to use for training the critic
model (Optional[Model]) – Model to use for the critic
running_objective (Optional[Objective]) – Objective function to use for the critic
discount_factor (float) – Discount factor to use in the value calculation
observation_target (Optional[np.ndarray]) – Target observation for the critic
sampling_time (float) – Sampling time for the critic
critic_regularization_param (float) – Regularization parameter for the critic

Methods

`__init__`(system_dim_input, …[, optimizer, …])	Initialize a critic object.
`accept_or_reject_weights`(weights[, …])	Determine whether to accept or reject the given weights based on whether they violate the given constraints.
`cache_weights`([weights])	Stores a copy of the current model weights.
`initialize_buffers`()	Initialize the action and observation buffers with zeros.
`objective`()
`optimize_weights`([time])	Compute optimized critic weights, possibly subject to constraints.
`reset`()	Reset the outcome and current critic loss variables, and re-initialize the buffers.
`restore_weights`()	Restores the model weights to the cached weights.
`update_and_cache_weights`([weights])	Update the model’s weights and cache the new values.
`update_buffers`(observation, action)	Updates the buffers of the critic with the given observation and action.
`update_outcome`(observation, action)	Update the outcome variable based on the running objective and the current observation and action.
`update_target`(new_target)
`update_weights`([weights])	Update the weights of the critic model.

Attributes

optimizer_engine

Returns the engine used by the optimizer.