|
Implementations
Paper Implementations
- Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
(Paper | Code)
- SAINT: Attention-Based Policies for Discrete Combinatorial Action Spaces
(Paper | Code)
- BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces
(Paper | Code)
Note Implementations
- Q-Learning
(Note | Code)
- Deep Q-Learning
(Note | Code)
- REINFORCE
(Note | Code)
- Advantage Actor-Critic (A2C)
(Note | Code)
- Deep Deterministic Policy Gradients (DDPG)
(Note | Code)
- Proximal Policy Optimization (PPO)
(Note | Code)
- Twin Delayed Deep Deterministic Policy Gradients (TD3)
(Note | Code)
- Soft Actor-Critic (SAC)
(Note | Code)
- Transformer
(Note | Code)
- Decoder-only Transformer
(Note coming soon | Code)
- Encoder-only Transformer
(Note coming soon | Code)
|