Back to Works

LegoLLM

ONGOING
2025

A complete LLM development framework built from first principles. Every component — tokenization, embeddings, attention, training, generation, alignment — is implemented from scratch as modular "Lego pieces" that can be combined, swapped, and extended. The goal is to deeply understand the entire LLM stack, not just use it.

Core pipeline: Raw Text → Tokenization → Embeddings → Attention → Transformer → Training → Generation → Alignment

Roadmap

Phase 1 — Core Foundation

Tokenization (BPE from scratch), embeddings, multi-head causal attention, GPT-2 architecture, DataLoader, Trainer, generation strategies

Phase 2 — Pretrained Weights

Small-scale pretraining, loading HuggingFace GPT-2 weights (safetensors), Conv1D→Linear mapping, fused QKV splitting

Phase 3 — Instruction Fine-tuning

SFT, Alpaca-style chat formatting, dynamic padding, loss masking, LoRA

4
Phase 4 — Modern Architectures

LLaMA 3 (RoPE, RMSNorm, SwiGLU, GQA), KV Cache, Qwen 3

5
Phase 5 — Alignment

DPO, PPO/RLHF, Mixture of Experts

Architecture Highlights

Repository
GitHub
Stack
PyTorch, NumPy, tiktoken, safetensors, Rich, pytest, MkDocs, Ruff, GitHub Actions
Models
GPT-2 (all 4 sizes) · LLaMA 3 (planned) · Qwen 3 (planned)
Tests
218 unit tests + integration tests

Module Structure

legollm/
├── architectures/       # GPT-2, LLaMA 3, Qwen 3
├── components/
│   ├── attention/       # Multi-head, Grouped-query
│   ├── blocks/          # Transformer block
│   ├── embeddings/
│   ├── feedforward/
│   └── normalization/
├── core/
│   ├── interfaces.py    # Protocol contracts
│   └── tokenization/    # NaiveBPE, RegexBPE
├── data/                # Memmap DataLoader
├── training/            # Trainer (cosine LR, AdamW)
├── generation/          # Greedy, top-k, top-p
├── optimization/        # KV Cache
├── peft/                # LoRA (upcoming)
└── finetuning/          # SFT, chat formatting