--- language: - en license: mit tags: - text-generation - transformer - conversational datasets: - HuggingFaceFW/fineweb-edu - cais/mmlu - gsm8k - HuggingFaceTB/smoltalk model-index: - name: nanochat results: - task: type: text-generation dataset: name: MMLU type: cais/mmlu metrics: - type: accuracy value: 31.51 - task: type: text-generation dataset: name: GSM8K type: gsm8k metrics: - type: accuracy value: 4.55 - task: type: text-generation dataset: name: HumanEval type: openai_humaneval metrics: - type: pass@1 value: 8.54 --- # nanochat **nanochat** is a 561M parameter transformer language model trained for conversational AI tasks. This model demonstrates that capable chat models can be trained efficiently on modest hardware budgets (~$100 on 8x H100 GPUs). Read about the process at https://samdobson.uk/posts/training-a-chatgpt-clone-for-cheap/ Chat with the model at https://huggingface.co/spaces/sdobson/nanochat ## Model Description - **Developed by:** Andrej Karpathy - **Trained by:** Sam Dobson - **Model type:** Transformer-based causal language model - **Language(s):** English - **License:** MIT - **Parameters:** 560,988,160 (~561M) ### Architecture - **Layers:** 20 - **Hidden size:** 1280 channels - **Attention heads:** 10 - **Head dimension:** 128 - **Vocabulary size:** 65,536 tokens ## Training Details ### Training Data nanochat was trained in multiple stages: 1. **Pretraining:** 100B token subset of FineWeb-EDU (11.2B tokens processed) 2. **Midtraining:** SmolTalk conversations, MMLU multiple choice questions, GSM8K math problems 3. **Supervised Fine-tuning (SFT):** Conversational adaptation data ### Training Procedure #### Tokenization - Custom Rust-based tokenizer - Vocabulary: 65,536 tokens - Compression ratio: 4.8 characters per token #### Training Infrastructure - **Hardware:** 8x H100 GPUs (Lambda GPU Cloud) - **Training time:** ~3 hours for pretraining stage - **Estimated compute:** ~4e19 FLOPs - **Total cost:** ~$100 #### Training Stages The model was trained in three stages: 1. **Pretraining** on web text (FineWeb-EDU) 2. **Midtraining** on domain-specific datasets (reasoning, conversation, maths) 3. **Supervised fine-tuning** for chat optimisation ## Performance ### Benchmark Results | Benchmark | Score | Description | |-----------|-------|-------------| | **MMLU** | 23.99% | Multitask language understanding | | **GSM8K** | 4.47% | Grade school math problems | | **HumanEval** | 6.71% | Python code generation | | **ARC-Easy** | 24.79% | Science questions (easy) | | **ARC-Challenge** | 24.32% | Science questions (hard) | | **ChatCORE** | 1.73% | Conversational reasoning | ## Intended Use ### Direct Use nanochat is designed for: - Conversational AI applications - Research on efficient language model training - Educational purposes for understanding LLM training pipelines - Low-resource deployment scenarios ### Downstream Use The model can be fine-tuned for specific conversational tasks or used as a base model for further domain adaptation. ### Out-of-Scope Use - Production-grade conversational AI (the model is relatively small and has limited capabilities) - Tasks requiring specialised knowledge or high accuracy - Critical applications where errors could cause harm ## Limitations and Bias - **Small scale:** At 561M parameters, this model has significantly fewer capabilities than larger models (1B+ parameters) - **Limited training:** Trained on only 11.2B tokens, which is modest by modern standards - **Performance:** Benchmark scores indicate limited reasoning and mathematical capabilities - **Bias:** Inherits biases from training data (FineWeb-EDU, SmolTalk, etc.) - **Language:** English-only ## Inference guide Simon Willison created a script to allow this to run on CPU on MacOS: ``` cd /tmp git clone https://huggingface.co/sdobson/nanochat uv run https://gist.githubusercontent.com/simonw/912623bf00d6c13cc0211508969a100a/raw/80f79c6a6f1e1b5d4485368ef3ddafa5ce853131/generate_cpu.py \ --model-dir /tmp/nanochat \ --prompt "Tell me about dogs." ``` Otherwise you can: 1. Download all files 2. Put `tokenizer.pkl` and `token_bytes.pt` in `~/.cache/nanochat/tokenizer` 3. Put `model_000650.pt` and `meta_000650.json` in `~/.cache/nanochat/chatsft_checkpoints/d20` 4. Clone https://github.com/karpathy/nanochat 5. Run `uv sync` followed by `uv run python -m scripts.chat_web` ## Citation **Repository:** [github.com/karpathy/nanochat](https://github.com/karpathy/nanochat) ```bibtex @software{nanochat2025, author = {Karpathy, Andrej}, title = {nanochat: A 561M parameter conversational language model}, year = {2025}, url = {https://github.com/karpathy/nanochat} } ``` ## Model Card Author Sam Dobson