> [!info] Course code
> - [picollm/README.md](https://github.com/Montekkundan/llm/blob/main/picollm/README.md)
> - [picollm/accelerated/README.md](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/README.md)
> - [picollm/accelerated/speedrun.sh](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/speedrun.sh)
> - [picollm/accelerated/pretrain/train.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/train.py)
> - [picollm/accelerated/chat/sft.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/sft.py)
## What This Concept Is
You finish a concept note, you understand the idea, and then you open the repo and ask, "okay, but which file actually does that?" This note exists to answer that exact question for `picollm`.
It is the bridge from lecture understanding to code navigation.
## Foundation Terms You Need First
A **code map** links ideas to files and runtime surfaces. A **small surface** is the notebook or toy runtime where you first learn the idea. A **serious surface** is the accelerated implementation used in the real stack. An **operator surface** is the set of scripts and reports used to run, inspect, or ship the system.
So the habit this note builds is simple: when a lecture introduces a concept, you should know where to go in the repo to see that concept become code.
```mermaid
flowchart TD
A["Concept notes and notebooks"] --> B["course_tools and small scripts"]
B --> C["picollm/accelerated core runtime"]
C --> D["Base train, base eval, chat SFT, chat eval"]
D --> E["Serving surfaces: CLI, web UI, API-compatible routes"]
D --> F["Operator surfaces: report, manifests, HF upload, restore, export"]
E --> G["End-user apps and demos"]
```
## The big split
Think of the repo in two layers:
1. a concept-first runtime for learning
2. a serious accelerated runtime for the final from-scratch chatbot
That split matches the course structure much better than trying to flatten everything into one folder story.
The important change in the current course version is this:
- `course_tools/` and the notebooks are still the cleanest place to learn the idea
- `picollm/accelerated/` is now the single serious from-scratch path
## 1. Concept-first runtime
- [course_tools/runtime.py](https://github.com/Montekkundan/llm/blob/main/course_tools/runtime.py)
The smallest concept runtime in the repo.
- [scripts/base_training_flow/run.py](https://github.com/Montekkundan/llm/blob/main/scripts/base_training_flow/run.py)
The local base-model flow used in the smaller practical arc.
- [scripts/base_evaluation_flow/run.py](https://github.com/Montekkundan/llm/blob/main/scripts/base_evaluation_flow/run.py)
The corresponding local evaluation path.
This layer connects most directly to:
- [[Training Loop]]
- [[Causal Language Modeling]]
- [[Chat Format and SFT]]
## 2. Tokenizer and conversation representation
- [picollm/accelerated/tokenizer.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tokenizer.py)
[[Glossary#Special tokens|Special tokens]], GPT-4-style split regex, byte-level [[Glossary#BPE|BPE]] helpers, and conversation rendering.
- [picollm/accelerated/pretrain/train_tokenizer.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/train_tokenizer.py)
Train the serious [[Glossary#Tokenizer|tokenizer]].
- [picollm/accelerated/pretrain/tokenizer_eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/tokenizer_eval.py)
Evaluate tokenization behavior before the expensive run.
- [picollm/accelerated/dataset.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/dataset.py)
Download and manage the shard set used by the accelerated stack.
This layer connects most directly to:
- [[LLM/concepts/Tokenization]]
- [[Chat Format and SFT]]
- [[Data Curation and Dataset Quality]]
## 3. Model architecture and kernels
- [picollm/accelerated/gpt.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/gpt.py)
The actual serious model: embeddings, RoPE, decoder blocks, optimizer setup, and scaling counts.
- [picollm/accelerated/flash_attention.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/flash_attention.py)
FlashAttention 3 or SDPA selection.
- [picollm/accelerated/fp8.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/fp8.py)
FP8 conversion helpers for supported hardware.
- [picollm/accelerated/optim.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/optim.py)
Optimizer definitions used by the accelerated stack.
This layer connects most directly to:
- [[Embedding Layer]]
- [[Positional Encoding]]
- [[Scaled Dot-Product Attention]]
- [[Decoder Block]]
- [[Quantization]]
## 4. Base pretraining
- [picollm/accelerated/pretrain/train.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/train.py)
Main accelerated base-training entrypoint.
- [picollm/accelerated/dataloader.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/dataloader.py)
Distributed token batch construction.
- [picollm/accelerated/common.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/common.py)
Device detection, dtype selection, and runtime initialization.
- [picollm/accelerated/checkpoint_manager.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/checkpoint_manager.py)
Save and restore checkpoints safely.
- [picollm/accelerated/speedrun.sh](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/speedrun.sh)
One-command pipeline that chains the serious stages together.
This layer connects most directly to:
- [[Training Loop]]
- [[Training Configuration and Hyperparameters]]
- [[Compute, Time, and Cost of LLMs]]
- [[Distributed Training and Multi-GPU]]
## 5. Chat SFT and task mixture
- [picollm/accelerated/chat/sft.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/sft.py)
Supervised fine-tuning on top of the base [[Glossary#Checkpoint|checkpoint]].
- [picollm/accelerated/tasks/common.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/common.py)
Mixture logic for task sampling.
- [picollm/accelerated/tasks/smoltalk.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/smoltalk.py)
Dialogue-oriented data.
- [picollm/accelerated/tasks/mmlu.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/mmlu.py)
Multiple-choice knowledge tasks.
- [picollm/accelerated/tasks/gsm8k.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/gsm8k.py)
Math and reasoning tasks.
- [picollm/accelerated/tasks/spellingbee.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/spellingbee.py)
Simple spelling and counting supervision.
This layer connects most directly to:
- [[Chat Format and SFT]]
- [[SFT Flow]]
- [[Evaluation and Model Quality]]
## 6. Evaluation and reporting
- [picollm/accelerated/pretrain/eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/eval.py)
Base-model evaluation and sampling.
- [picollm/accelerated/chat/eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/eval.py)
Post-[[Glossary#SFT|SFT]] chat evaluation.
- [picollm/accelerated/core_eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/core_eval.py)
Shared core-eval logic.
- [picollm/accelerated/loss_eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/loss_eval.py)
[[Glossary#Bits per byte (BPB)|BPB]]-oriented [[Glossary#Loss|loss]] evaluation.
- [picollm/accelerated/report.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/report.py)
Reset and generate run summaries.
This layer connects most directly to:
- [[Evaluation and Model Quality]]
- [[Experiment Tracking and Run Analysis]]
- [[Formal Evaluation and Benchmarking]]
## 7. Serving and inference runtime
- [picollm/accelerated/engine.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/engine.py)
[[Glossary#Prefill|Prefill]], decoding, and runtime generation logic.
- [picollm/accelerated/chat/cli.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/cli.py)
Terminal interaction surface.
- [picollm/accelerated/chat/web.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/web.py)
Local browser interaction surface.
- [picollm/accelerated/ui.html](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/ui.html)
Minimal built-in web UI.
This layer connects most directly to:
- [[Inference Runtime and KV Cache]]
- [[Real Chatbot Workflow]]
- [[OpenTUI Terminal Chat App]]
- [[Vercel AI SDK Chat App]]
## 8. Operator and release tooling
- [picollm/accelerated/speedrun_doctor.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/speedrun_doctor.py)
Pre-run validation for hardware, disk, and artifact assumptions.
- [scripts/verify_identity_asset.py](https://github.com/Montekkundan/llm/blob/main/scripts/verify_identity_asset.py)
Verifies the canonical identity file or hosted mirror against the manifest.
- [scripts/write_picollm_run_manifest.py](https://github.com/Montekkundan/llm/blob/main/scripts/write_picollm_run_manifest.py)
Writes machine-readable run metadata into `PICOLLM_BASE_DIR`.
- [scripts/upload_picollm_model_to_hf.py](https://github.com/Montekkundan/llm/blob/main/scripts/upload_picollm_model_to_hf.py)
Publishes the inference-focused artifact bundle.
- [scripts/upload_picollm_archive_to_hf.py](https://github.com/Montekkundan/llm/blob/main/scripts/upload_picollm_archive_to_hf.py)
Publishes the fuller archive bundle for preservation and resume.
- [scripts/restore_picollm_from_hf.py](https://github.com/Montekkundan/llm/blob/main/scripts/restore_picollm_from_hf.py)
Restores a published model repo into a local artifact directory.
- [scripts/export_picollm_to_transformers.py](https://github.com/Montekkundan/llm/blob/main/scripts/export_picollm_to_transformers.py)
Exports native checkpoints into a Transformers `trust_remote_code` bundle.
- [scripts/export_picollm_to_gguf.py](https://github.com/Montekkundan/llm/blob/main/scripts/export_picollm_to_gguf.py)
Exports native checkpoints into picoLLM-architecture GGUF.
## How to read the repo in course order
If you want to understand `picollm` without getting lost, use this order:
1. `course_tools/runtime.py`
2. `picollm/accelerated/tokenizer.py`
3. `picollm/accelerated/gpt.py`
4. `picollm/accelerated/pretrain/train.py`
5. `picollm/accelerated/chat/sft.py`
6. `picollm/accelerated/chat/eval.py`
7. `picollm/accelerated/engine.py`
8. `picollm/accelerated/speedrun.sh`
That order follows the same logic as the course:
- represent text
- build the model
- train the [[Glossary#Base model|base model]]
- specialize the model into a chatbot
- serve the chatbot
- automate the whole workflow
## Relation to external references
Two external projects are especially helpful when you want to compare our companion code against well-known references:
- [`rasbt/LLMs-from-scratch`](https://github.com/rasbt/LLMs-from-scratch)
Strong for intuition, smaller experiments, and understanding the learning mechanics.[^4]
- [`nanochat`](https://github.com/karpathy/nanochat)
Strong for serious cloud training workflow, speedrun orchestration, and end-to-end experimentation.[^5]
`picollm` now sits between those two references:
- it keeps the course clarity that you need
- but the accelerated stack adopts the same broad `tokenizer -> base train -> SFT -> eval -> chat` logic that makes `nanochat` so useful as a serious reference
For the current serious path, you should also know that `picollm/accelerated/speedrun.sh` is not just a thin shell wrapper. It now does:[^2][^5]
- dataset download
- tokenizer train and tokenizer eval
- base pretraining
- base evaluation
- identity-file verification
- chat SFT
- chat evaluation
- report generation
- run-manifest generation
- optional Hugging Face model and archive upload
- direct handoff into CLI or web mode
<div style="display:flex; gap:1rem; margin:1.5rem 0; flex-wrap:wrap;">
<div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);">
<div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Previous</div>
<div><a class="internal-link" data-href="Real Chatbot Workflow" href="Real%20Chatbot%20Workflow">Real Chatbot Workflow</a></div>
</div>
<div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);">
<div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Next</div>
<div><a class="internal-link" data-href="Nanochat Architecture" href="Nanochat%20Architecture">Nanochat Architecture</a></div>
</div>
</div>
## References
[^1]: Montekkundan, [llm repository](https://github.com/Montekkundan/llm)
[^2]: Hugging Face TB, [The Smol Training Guide](https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook)
[^4]: Sebastian Raschka, [rasbt/LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch)
[^5]: Andrej Karpathy, [nanochat](https://github.com/karpathy/nanochat)