SFT Flow - lectures

> [!info] Course code > Use the companion repository for runnable notebooks, figures, and implementation references for this lecture: > - [notebooks/sft_flow/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/sft_flow/lecture_walkthrough.ipynb) > - [notebooks/chat_format_and_sft/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/chat_format_and_sft/lecture_walkthrough.ipynb) > - [picollm/accelerated/chat/sft.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/sft.py) > - [picollm/accelerated/chat/eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/eval.py) > - [picollm/accelerated/tasks/common.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/common.py) ## What This Concept Is Suppose you already have a base checkpoint and want to see the shortest path from that checkpoint to assistant-style behavior. This note is that compact path. It is essentially the small, readable version of the larger chat post-training story. ## Foundation Terms You Need First A **base checkpoint** is the model before chat specialization. **Chat formatting** turns structured conversations into one causal token stream. A **supervision target** is the subset of tokens the model is asked to predict during SFT. **Chat eval** is the measurement step that checks how the resulting model behaves. So the rhythm of this note is simple: start from the base model, reshape the examples, train on the assistant side, then compare the behavior after that change. ```mermaid flowchart TD A["Base checkpoint"] --> B["Chat formatting and task mixture"] B --> C["Supervised fine-tuning"] C --> D["Chat evaluation and side-by-side prompts"] D --> E["Serve the chat model"] ``` ## Course Position Read this note with one hierarchy in mind: - `picollm` is the primary implementation path - the notebook is the main walkthrough surface for the [[Glossary#SFT|SFT]] workflow - `rasbt/LLMs-from-scratch` is the clean concept-first comparison - `nanochat` is the systems-first comparison for a fuller end-to-end training stack That hierarchy matters because people often mistake "assistant behavior" for a different model class. It is still the same causal language model, adapted through new data formatting and supervised updates. ## Workflow It is easy to understand the idea of SFT but still not see the workflow clearly. This note turns the concept into a sequence: 1. start from a base [[Glossary#Checkpoint|checkpoint]] 2. define one canonical chat schema 3. fine-tune on that schema 4. compare base versus adapted behavior on the same prompts That sequence is exactly what the companion notebook demonstrates, and it is also the shape of `picollm/accelerated/chat/sft.py`. > [!example] Notebook follow-up > - [notebooks/sft_flow/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/sft_flow/lecture_walkthrough.ipynb) > - [notebooks/chat_format_and_sft/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/chat_format_and_sft/lecture_walkthrough.ipynb) > Use these notebooks here to walk the workflow and then compare it back to the format note directly. ## What to show live Use the notebook to make three points visible: - a base checkpoint can be syntactically functional but behaviorally unhelpful - SFT changes answer style and instruction following - comparison only makes sense on a fixed prompt set - the accelerated `picollm` path does not use only one chat dataset; it uses a task mixture for dialogue, reasoning, identity, and formatting behavior If both models still look weak in a tiny run, that is not a bug in the lesson. The notebook is showing the flow, not pretending a toy SFT run becomes a world-class assistant. ## Relationship to the rest of the course Teach this after: - [[Training Loop]] - [[Chat Format and SFT]] Teach it before: - [[Real Chatbot Workflow]] - [[Nanochat Architecture]] Use that order to keep the transition clean: - first understand the objective - then see the workflow - then move to more realistic post-training surfaces ## Theory To Product Connect this note to four earlier ideas: - [[Causal Language Modeling]] explains why SFT still uses next-token prediction - [[Chat Format and SFT]] explains how role-structured conversations become training text - [[Evaluation and Model Quality]] explains why base-vs-SFT comparison needs controlled prompts and [[Glossary#Benchmark|benchmark]] checks - [[Real Chatbot Workflow]] shows how this SFT stage sits inside the full accelerated `picollm` pipeline If those links are clear here, post-training stops looking like a separate mystery phase. It is the same model architecture moving into a different data regime. ## Key takeaway SFT is where a general next-token model first starts behaving like an assistant. The main lesson is that the workflow is inspectable, comparable, and grounded in a stable data format. In the current repo, the most important files for that claim are: - `picollm/accelerated/tokenizer.py` for rendering and masking - `picollm/accelerated/tasks/common.py` for mixture logic - `picollm/accelerated/chat/sft.py` for the real fine-tuning loop - `picollm/accelerated/chat/eval.py` for post-SFT task checks > [!example] Notebook walkthroughs in this lecture > Use these companion notebook links as you read or review this lecture: > > - [notebooks/sft_flow/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/sft_flow/lecture_walkthrough.ipynb) > - [notebooks/chat_format_and_sft/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/chat_format_and_sft/lecture_walkthrough.ipynb) <div style="display:flex; gap:1rem; margin:1.5rem 0; flex-wrap:wrap;"> <div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);"> <div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Previous</div> <div><a class="internal-link" data-href="Chat Format and SFT" href="Chat%20Format%20and%20SFT">Chat Format and SFT</a></div> </div> <div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);"> <div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Next</div> <div><a class="internal-link" data-href="Failure Modes and Debugging" href="Failure%20Modes%20and%20Debugging">Failure Modes and Debugging</a></div> </div> </div> ## Further reading - Long Ouyang et al., "Training language models to follow instructions with human feedback," 2022. https://arxiv.org/abs/2203.02155 - Hugging Face, "Chat templates," 2025. https://huggingface.co/docs/transformers/chat_templating - Hugging Face, "SFT Trainer," 2025. https://huggingface.co/docs/trl/en/sft_trainer - Hugo Touvron et al., "Llama 2: Open Foundation and Fine-Tuned Chat Models," 2023. https://arxiv.org/abs/2307.09288