Mini Transformer Lab
From-scratch transformer implementation for architectural exploration
Why I built this
High-level APIs hide architectural trade-offs. I wanted a controllable environment to test transformer design choices and understand what breaks training and what scales.
What I learned
Modular design enabled rapid iteration and clear comparisons. Simplicity in architecture matters more than complex optimizations for understanding. What didn't work: initial over-parameterization slowed experiments.
TL;DR
Built a minimal transformer from scratch to study architectural trade-offs through controlled ablations. Enabled systematic experiments on depth, width, attention heads, and training dynamics.
Why this matters
High-level APIs hide architectural trade-offs. Understanding what breaks training and what scales requires controllable experiments.
Problem
High-level APIs hide architectural trade-offs. I wanted a controllable environment to test transformer design choices.
Constraints
Limited compute: experiments must run on consumer hardware. Reproducibility: all ablations must be deterministic and comparable. Educational clarity: code must be readable and well-documented.
System design
Evaluation
Results
Clearer understanding of what breaks training, what scales, and how architecture choices change attention patterns.
Trade-offs & Lessons
Understanding architecture trade-offs informs system design choices. Modular design enabled rapid iteration and clear comparisons. The initial over-parameterization slowed experiments. Simplicity in architecture matters more than complex optimizations for understanding.
What I'd Improve Next
Add more systematic ablation studies across attention mechanisms. Implement gradient analysis tools for training dynamics. Explore alternative positional encoding strategies.