Back to Works

LoRA Insights: PEFT Recipes

2024

Reproducing and extending Sebastian Raschka's LoRA experiments across multiple models to find practical fine-tuning recipes. Ran hundreds of experiments on an H100 GPU comparing LoRA and QLoRA configurations, evaluating with EleutherAI's lm-evaluation-harness on tasks like TruthfulQA, arithmetic, and MMLU.

Models Tested

Key Takeaways

Evaluation

Models were evaluated using EleutherAI's lm-evaluation-harness on 6 tasks: truthfulqa_mc1, truthfulqa_mc2, arithmetic_2ds, arithmetic_4ds, blimp_causative, and mmlu_global_facts. Base model scores were compared against each fine-tuned variant to measure improvement or regression.

Experiment Setup

Memory Requirements

Config Model Footprint Training Memory
LoRA (bfloat16) 5.98 GB 52.86 GiB
QLoRA (nf4) 2.05 GB 44.20 GiB

Per-Model Observations

Repository
GitHub
Platform
H100 GPU (Ori Cloud)
Stack
PyTorch, TRL, PEFT, bitsandbytes, lm-evaluation-harness
Dataset
Alpaca Cleaned

Benchmark Results

Base Model Comparisons

Base Model Comparisons

Llama 3.2 3B — LoRA vs QLoRA

Llama 3.2 3B Results

Qwen 2.5 3B

Qwen 2.5 3B Results

Llama 3.2 1B

Llama 3.2 1B Results