All Posts
- The Sand Mandala November 21, 2023
A vignette about purpose.
- Monotonic Attention November 9, 2023
Write-up explaining my implementation of monotonic attention using a probabilistic graphical model.
- Retentive Networks and RWKV September 16, 2023
A short, hand-wavy explainer for the mathematical intuition behind faster attention mechanisms.
- Miscellaneous Azure Notes September 14, 2023
Miscellaneous notes about various Azure-related things.
- Miscellaneous AWS Notes September 13, 2023
Miscellaneous notes about various AWS-related things.
- Streaming Convolutions August 24, 2023
Working out the math for streaming convolutions.
- Diffusion verses Flow Matching July 19, 2023
An accessible introduction to diffusion and flow matching models. This post aims to be both complete and easy-to-follow as a reference for implementing diffusion models yourself.
- Fast Attention Implementations June 29, 2023
A reference collection of fast attention implementations.
- Starting a Startup June 27, 2023
I left FAIR to start a startup a few weeks ago, and figured I should describe what we're actually doing.
- RWKV Language Model Math June 16, 2023
In-depth explanation of the math behind the RWKV model, with PyTorch implementations, plus a discussion of numerical stability.