Pseudocode

AlphaZero

AlphaZero pseudocode 

Backpropagation

backpropagation pseudocode 

Conjugate Gradient Method

conjugate gradient pseudocode 

Constraint-Controlled Reinforcement Learning

Constraint-Controlled Reinforcement Learning pseudocode 

Deep Q-learning

Deep Q learning with experience replay and target network pseudocode 

Deep Deterministic policy gradients (DDPG)

deep deterministic policy gradients pseudocode 

Deterministic policy gradients

deterministic policy gradients pseudocode 

Double Deep Q-learning

double q-learning pseudocode 

Dual Gradient Descent

dual gradient descent pseudocode 

Every-visit Monte Carlo prediction

every-visit MC pseudocode 

First-visit Monte Carlo prediction

first-visit MC pseudocode 

Implicit Q-Learning

IQL pseudocode 

\(n\)-step TD

n-step TD pseudocode 

One-step actor-critic

one step actor critic pseudocode 

Policy iteration

policy iteration pseudocode 

PPO

PPO pseudocode 

Q-learning

SARSA pseudocode 

REINFORCE

REINFORCE pseudocode 

SARSA

SARSA pseudocode 

Semi-gradient SARSA (episodic)

semi-gradient sarsa pseudocode 

Semi-gradient TD(0)

semi-gradient TD(0) 

Soft Actor-Critic (SAC)

soft actor-critic pseudocode 

Successor Features with Generalized Policy Improvement

successor features with GPI pseudocode CMDP 

Twin Delayed Deep Deterministic Policy Gradients (TD3)

td3 pseudocode 

TD(0)

TD(0) pseudocode 

TD(\(\lambda\))

TD-lambda pseudocode 

Trust Region Policy Optimization (TRPO)

TRPO pseudocode 

Value iteration

value iteration pseudocode