Implementations

Paper Implementations

Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization

(Paper | Code)

SAINT: Attention-Based Policies for Discrete Combinatorial Action Spaces

(Paper | Code)

BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

(Paper | Code)

Note Implementations

Q-Learning

(Note | Code)

Deep Q-Learning

(Note | Code)

REINFORCE

(Note | Code)

Advantage Actor-Critic (A2C)

(Note | Code)

Deep Deterministic Policy Gradients (DDPG)

(Note | Code)

Proximal Policy Optimization (PPO)

(Note | Code)

Twin Delayed Deep Deterministic Policy Gradients (TD3)

(Note | Code)

Soft Actor-Critic (SAC)

(Note | Code)

Transformer

(Note | Code)

Decoder-only Transformer

(Note coming soon | Code)

Encoder-only Transformer

(Note coming soon | Code)