picollm Code Map - lectures

> [!info] Course code > - [picollm/README.md](https://github.com/Montekkundan/llm/blob/main/picollm/README.md) > - [picollm/accelerated/README.md](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/README.md) > - [picollm/accelerated/speedrun.sh](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/speedrun.sh) > - [picollm/accelerated/pretrain/train.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/train.py) > - [picollm/accelerated/chat/sft.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/sft.py) ## What This Concept Is You finish a concept note, you understand the idea, and then you open the repo and ask, "okay, but which file actually does that?" This note exists to answer that exact question for `picollm`. It is the bridge from lecture understanding to code navigation. ## Foundation Terms You Need First A **code map** links ideas to files and runtime surfaces. A **small surface** is the notebook or toy runtime where you first learn the idea. A **serious surface** is the accelerated implementation used in the real stack. An **operator surface** is the set of scripts and reports used to run, inspect, or ship the system. So the habit this note builds is simple: when a lecture introduces a concept, you should know where to go in the repo to see that concept become code. ```mermaid flowchart TD A["Concept notes and notebooks"] --> B["course_tools and small scripts"] B --> C["picollm/accelerated core runtime"] C --> D["Base train, base eval, chat SFT, chat eval"] D --> E["Serving surfaces: CLI, web UI, API-compatible routes"] D --> F["Operator surfaces: report, manifests, HF upload, restore, export"] E --> G["End-user apps and demos"] ``` ## The big split Think of the repo in two layers: 1. a concept-first runtime for learning 2. a serious accelerated runtime for the final from-scratch chatbot That split matches the course structure much better than trying to flatten everything into one folder story. The important change in the current course version is this: - `course_tools/` and the notebooks are still the cleanest place to learn the idea - `picollm/accelerated/` is now the single serious from-scratch path ## 1. Concept-first runtime - [course_tools/runtime.py](https://github.com/Montekkundan/llm/blob/main/course_tools/runtime.py) The smallest concept runtime in the repo. - [scripts/base_training_flow/run.py](https://github.com/Montekkundan/llm/blob/main/scripts/base_training_flow/run.py) The local base-model flow used in the smaller practical arc. - [scripts/base_evaluation_flow/run.py](https://github.com/Montekkundan/llm/blob/main/scripts/base_evaluation_flow/run.py) The corresponding local evaluation path. This layer connects most directly to: - [[Training Loop]] - [[Causal Language Modeling]] - [[Chat Format and SFT]] ## 2. Tokenizer and conversation representation - [picollm/accelerated/tokenizer.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tokenizer.py) [[Glossary#Special tokens|Special tokens]], GPT-4-style split regex, byte-level [[Glossary#BPE|BPE]] helpers, and conversation rendering. - [picollm/accelerated/pretrain/train_tokenizer.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/train_tokenizer.py) Train the serious [[Glossary#Tokenizer|tokenizer]]. - [picollm/accelerated/pretrain/tokenizer_eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/tokenizer_eval.py) Evaluate tokenization behavior before the expensive run. - [picollm/accelerated/dataset.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/dataset.py) Download and manage the shard set used by the accelerated stack. This layer connects most directly to: - [[LLM/concepts/Tokenization]] - [[Chat Format and SFT]] - [[Data Curation and Dataset Quality]] ## 3. Model architecture and kernels - [picollm/accelerated/gpt.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/gpt.py) The actual serious model: embeddings, RoPE, decoder blocks, optimizer setup, and scaling counts. - [picollm/accelerated/flash_attention.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/flash_attention.py) FlashAttention 3 or SDPA selection. - [picollm/accelerated/fp8.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/fp8.py) FP8 conversion helpers for supported hardware. - [picollm/accelerated/optim.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/optim.py) Optimizer definitions used by the accelerated stack. This layer connects most directly to: - [[Embedding Layer]] - [[Positional Encoding]] - [[Scaled Dot-Product Attention]] - [[Decoder Block]] - [[Quantization]] ## 4. Base pretraining - [picollm/accelerated/pretrain/train.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/train.py) Main accelerated base-training entrypoint. - [picollm/accelerated/dataloader.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/dataloader.py) Distributed token batch construction. - [picollm/accelerated/common.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/common.py) Device detection, dtype selection, and runtime initialization. - [picollm/accelerated/checkpoint_manager.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/checkpoint_manager.py) Save and restore checkpoints safely. - [picollm/accelerated/speedrun.sh](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/speedrun.sh) One-command pipeline that chains the serious stages together. This layer connects most directly to: - [[Training Loop]] - [[Training Configuration and Hyperparameters]] - [[Compute, Time, and Cost of LLMs]] - [[Distributed Training and Multi-GPU]] ## 5. Chat SFT and task mixture - [picollm/accelerated/chat/sft.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/sft.py) Supervised fine-tuning on top of the base [[Glossary#Checkpoint|checkpoint]]. - [picollm/accelerated/tasks/common.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/common.py) Mixture logic for task sampling. - [picollm/accelerated/tasks/smoltalk.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/smoltalk.py) Dialogue-oriented data. - [picollm/accelerated/tasks/mmlu.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/mmlu.py) Multiple-choice knowledge tasks. - [picollm/accelerated/tasks/gsm8k.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/gsm8k.py) Math and reasoning tasks. - [picollm/accelerated/tasks/spellingbee.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/spellingbee.py) Simple spelling and counting supervision. This layer connects most directly to: - [[Chat Format and SFT]] - [[SFT Flow]] - [[Evaluation and Model Quality]] ## 6. Evaluation and reporting - [picollm/accelerated/pretrain/eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/pretrain/eval.py) Base-model evaluation and sampling. - [picollm/accelerated/chat/eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/eval.py) Post-[[Glossary#SFT|SFT]] chat evaluation. - [picollm/accelerated/core_eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/core_eval.py) Shared core-eval logic. - [picollm/accelerated/loss_eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/loss_eval.py) [[Glossary#Bits per byte (BPB)|BPB]]-oriented [[Glossary#Loss|loss]] evaluation. - [picollm/accelerated/report.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/report.py) Reset and generate run summaries. This layer connects most directly to: - [[Evaluation and Model Quality]] - [[Experiment Tracking and Run Analysis]] - [[Formal Evaluation and Benchmarking]] ## 7. Serving and inference runtime - [picollm/accelerated/engine.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/engine.py) [[Glossary#Prefill|Prefill]], decoding, and runtime generation logic. - [picollm/accelerated/chat/cli.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/cli.py) Terminal interaction surface. - [picollm/accelerated/chat/web.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/web.py) Local browser interaction surface. - [picollm/accelerated/ui.html](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/ui.html) Minimal built-in web UI. This layer connects most directly to: - [[Inference Runtime and KV Cache]] - [[Real Chatbot Workflow]] - [[OpenTUI Terminal Chat App]] - [[Vercel AI SDK Chat App]] ## 8. Operator and release tooling - [picollm/accelerated/speedrun_doctor.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/speedrun_doctor.py) Pre-run validation for hardware, disk, and artifact assumptions. - [scripts/verify_identity_asset.py](https://github.com/Montekkundan/llm/blob/main/scripts/verify_identity_asset.py) Verifies the canonical identity file or hosted mirror against the manifest. - [scripts/write_picollm_run_manifest.py](https://github.com/Montekkundan/llm/blob/main/scripts/write_picollm_run_manifest.py) Writes machine-readable run metadata into `PICOLLM_BASE_DIR`. - [scripts/upload_picollm_model_to_hf.py](https://github.com/Montekkundan/llm/blob/main/scripts/upload_picollm_model_to_hf.py) Publishes the inference-focused artifact bundle. - [scripts/upload_picollm_archive_to_hf.py](https://github.com/Montekkundan/llm/blob/main/scripts/upload_picollm_archive_to_hf.py) Publishes the fuller archive bundle for preservation and resume. - [scripts/restore_picollm_from_hf.py](https://github.com/Montekkundan/llm/blob/main/scripts/restore_picollm_from_hf.py) Restores a published model repo into a local artifact directory. - [scripts/export_picollm_to_transformers.py](https://github.com/Montekkundan/llm/blob/main/scripts/export_picollm_to_transformers.py) Exports native checkpoints into a Transformers `trust_remote_code` bundle. - [scripts/export_picollm_to_gguf.py](https://github.com/Montekkundan/llm/blob/main/scripts/export_picollm_to_gguf.py) Exports native checkpoints into picoLLM-architecture GGUF. ## How to read the repo in course order If you want to understand `picollm` without getting lost, use this order: 1. `course_tools/runtime.py` 2. `picollm/accelerated/tokenizer.py` 3. `picollm/accelerated/gpt.py` 4. `picollm/accelerated/pretrain/train.py` 5. `picollm/accelerated/chat/sft.py` 6. `picollm/accelerated/chat/eval.py` 7. `picollm/accelerated/engine.py` 8. `picollm/accelerated/speedrun.sh` That order follows the same logic as the course: - represent text - build the model - train the [[Glossary#Base model|base model]] - specialize the model into a chatbot - serve the chatbot - automate the whole workflow ## Relation to external references Two external projects are especially helpful when you want to compare our companion code against well-known references: - [`rasbt/LLMs-from-scratch`](https://github.com/rasbt/LLMs-from-scratch) Strong for intuition, smaller experiments, and understanding the learning mechanics.[^4] - [`nanochat`](https://github.com/karpathy/nanochat) Strong for serious cloud training workflow, speedrun orchestration, and end-to-end experimentation.[^5] `picollm` now sits between those two references: - it keeps the course clarity that you need - but the accelerated stack adopts the same broad `tokenizer -> base train -> SFT -> eval -> chat` logic that makes `nanochat` so useful as a serious reference For the current serious path, you should also know that `picollm/accelerated/speedrun.sh` is not just a thin shell wrapper. It now does:[^2][^5] - dataset download - tokenizer train and tokenizer eval - base pretraining - base evaluation - identity-file verification - chat SFT - chat evaluation - report generation - run-manifest generation - optional Hugging Face model and archive upload - direct handoff into CLI or web mode <div style="display:flex; gap:1rem; margin:1.5rem 0; flex-wrap:wrap;"> <div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);"> <div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Previous</div> <div><a class="internal-link" data-href="Real Chatbot Workflow" href="Real%20Chatbot%20Workflow">Real Chatbot Workflow</a></div> </div> <div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);"> <div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Next</div> <div><a class="internal-link" data-href="Nanochat Architecture" href="Nanochat%20Architecture">Nanochat Architecture</a></div> </div> </div> ## References [^1]: Montekkundan, [llm repository](https://github.com/Montekkundan/llm) [^2]: Hugging Face TB, [The Smol Training Guide](https://huggingface.co/spaces/HuggingFaceTB/smol-training-playbook) [^4]: Sebastian Raschka, [rasbt/LLMs-from-scratch](https://github.com/rasbt/LLMs-from-scratch) [^5]: Andrej Karpathy, [nanochat](https://github.com/karpathy/nanochat)