-
(WIP) Building GRPO From Scratch
Group Relative Policy Optimization (GRPO) is a widely used reinforcement learning algorithm that was popularized by DeepSeek. In this post, we start entirely from first principles and progressively add complexity until we get to the full GRPO objective.
-
Small Proofs and Derivations
Last updated: Dec 2, 2024
-
Three Discrete Sampling Methods
This post describes and implements three methods which can be used to sample from any discrete probability distribution.
-
Linear Sandwich
Creating a
Linearsandwich by stacking a bunch ofLinearlayers on top of each other just results in another linear transformation. Below, we show that this is indeed the case.