Eugen Hotaj Blog

Feb 27, 2026
GRPO from First Principles
Group Relative Policy Optimization (GRPO) is a widely used policy gradient algorithm that was popularized by DeepSeek [1]. In this post, we start entirely from first principles and progressively add complexity until we get to the full GRPO objective.
Mar 10, 2023
Small Proofs and Derivations
Last updated: March 1, 2026
Jan 15, 2023
Three Discrete Sampling Methods
This post describes and implements three methods which can be used to sample from any discrete probability distribution.
Oct 11, 2022
Linear Sandwich
Creating a Linear sandwich by stacking a bunch of Linear layers on top of each other just results in another linear transformation. Below, we show that this is indeed the case.