rcognita.critics.Critic
- class rcognita.critics.Critic(system_dim_input: int, system_dim_output: int, data_buffer_size: int, optimizer: Optional[rcognita.optimizers.Optimizer] = None, model: Optional[rcognita.models.Model] = None, running_objective: Optional[rcognita.objectives.Objective] = None, discount_factor: float = 1.0, observation_target: Optional[numpy.ndarray] = None, sampling_time: float = 0.01, critic_regularization_param: float = 0.0)
Critic base class.
A critic is an object that estimates or provides the value of a given action or state in a reinforcement learning problem.
The critic estimates the value of an action by learning from past experience, typically through the optimization of a loss function.
- __init__(system_dim_input: int, system_dim_output: int, data_buffer_size: int, optimizer: Optional[rcognita.optimizers.Optimizer] = None, model: Optional[rcognita.models.Model] = None, running_objective: Optional[rcognita.objectives.Objective] = None, discount_factor: float = 1.0, observation_target: Optional[numpy.ndarray] = None, sampling_time: float = 0.01, critic_regularization_param: float = 0.0)
Initialize a critic object.
- Parameters
system_dim_input (int) – Dimension of the input data
system_dim_output (int) – Dimension of the output data
data_buffer_size (int) – Maximum size of the data buffer
optimizer (Optional[Optimizer]) – Optimizer to use for training the critic
model (Optional[Model]) – Model to use for the critic
running_objective (Optional[Objective]) – Objective function to use for the critic
discount_factor (float) – Discount factor to use in the value calculation
observation_target (Optional[np.ndarray]) – Target observation for the critic
sampling_time (float) – Sampling time for the critic
critic_regularization_param (float) – Regularization parameter for the critic
Methods
__init__
(system_dim_input, …[, optimizer, …])Initialize a critic object.
accept_or_reject_weights
(weights[, …])Determine whether to accept or reject the given weights based on whether they violate the given constraints.
cache_weights
([weights])Stores a copy of the current model weights.
initialize_buffers
()Initialize the action and observation buffers with zeros.
objective
()optimize_weights
([time])Compute optimized critic weights, possibly subject to constraints.
reset
()Reset the outcome and current critic loss variables, and re-initialize the buffers.
restore_weights
()Restores the model weights to the cached weights.
update_and_cache_weights
([weights])Update the model’s weights and cache the new values.
update_buffers
(observation, action)Updates the buffers of the critic with the given observation and action.
update_outcome
(observation, action)Update the outcome variable based on the running objective and the current observation and action.
update_target
(new_target)update_weights
([weights])Update the weights of the critic model.
Attributes
optimizer_engine
Returns the engine used by the optimizer.