> [!info] Course code
> Use the companion repository for runnable notebooks, figures, and implementation references for this lecture:
> - [notebooks/sft_flow/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/sft_flow/lecture_walkthrough.ipynb)
> - [notebooks/chat_format_and_sft/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/chat_format_and_sft/lecture_walkthrough.ipynb)
> - [picollm/accelerated/chat/sft.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/sft.py)
> - [picollm/accelerated/chat/eval.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/eval.py)
> - [picollm/accelerated/tasks/common.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/tasks/common.py)
## What This Concept Is
Suppose you already have a base checkpoint and want to see the shortest path from that checkpoint to assistant-style behavior. This note is that compact path.
It is essentially the small, readable version of the larger chat post-training story.
## Foundation Terms You Need First
A **base checkpoint** is the model before chat specialization. **Chat formatting** turns structured conversations into one causal token stream. A **supervision target** is the subset of tokens the model is asked to predict during SFT. **Chat eval** is the measurement step that checks how the resulting model behaves.
So the rhythm of this note is simple: start from the base model, reshape the examples, train on the assistant side, then compare the behavior after that change.
```mermaid
flowchart TD
A["Base checkpoint"] --> B["Chat formatting and task mixture"]
B --> C["Supervised fine-tuning"]
C --> D["Chat evaluation and side-by-side prompts"]
D --> E["Serve the chat model"]
```
## Course Position
Read this note with one hierarchy in mind:
- `picollm` is the primary implementation path
- the notebook is the main walkthrough surface for the [[Glossary#SFT|SFT]] workflow
- `rasbt/LLMs-from-scratch` is the clean concept-first comparison
- `nanochat` is the systems-first comparison for a fuller end-to-end training stack
That hierarchy matters because people often mistake "assistant behavior" for a different model class. It is still the same causal language model, adapted through new data formatting and supervised updates.
## Workflow
It is easy to understand the idea of SFT but still not see the workflow clearly. This note turns the concept into a sequence:
1. start from a base [[Glossary#Checkpoint|checkpoint]]
2. define one canonical chat schema
3. fine-tune on that schema
4. compare base versus adapted behavior on the same prompts
That sequence is exactly what the companion notebook demonstrates, and it is also the shape of `picollm/accelerated/chat/sft.py`.
> [!example] Notebook follow-up
> - [notebooks/sft_flow/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/sft_flow/lecture_walkthrough.ipynb)
> - [notebooks/chat_format_and_sft/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/chat_format_and_sft/lecture_walkthrough.ipynb)
> Use these notebooks here to walk the workflow and then compare it back to the format note directly.
## What to show live
Use the notebook to make three points visible:
- a base checkpoint can be syntactically functional but behaviorally unhelpful
- SFT changes answer style and instruction following
- comparison only makes sense on a fixed prompt set
- the accelerated `picollm` path does not use only one chat dataset; it uses a task mixture for dialogue, reasoning, identity, and formatting behavior
If both models still look weak in a tiny run, that is not a bug in the lesson. The notebook is showing the flow, not pretending a toy SFT run becomes a world-class assistant.
## Relationship to the rest of the course
Teach this after:
- [[Training Loop]]
- [[Chat Format and SFT]]
Teach it before:
- [[Real Chatbot Workflow]]
- [[Nanochat Architecture]]
Use that order to keep the transition clean:
- first understand the objective
- then see the workflow
- then move to more realistic post-training surfaces
## Theory To Product
Connect this note to four earlier ideas:
- [[Causal Language Modeling]] explains why SFT still uses next-token prediction
- [[Chat Format and SFT]] explains how role-structured conversations become training text
- [[Evaluation and Model Quality]] explains why base-vs-SFT comparison needs controlled prompts and [[Glossary#Benchmark|benchmark]] checks
- [[Real Chatbot Workflow]] shows how this SFT stage sits inside the full accelerated `picollm` pipeline
If those links are clear here, post-training stops looking like a separate mystery phase. It is the same model architecture moving into a different data regime.
## Key takeaway
SFT is where a general next-token model first starts behaving like an assistant. The main lesson is that the workflow is inspectable, comparable, and grounded in a stable data format.
In the current repo, the most important files for that claim are:
- `picollm/accelerated/tokenizer.py` for rendering and masking
- `picollm/accelerated/tasks/common.py` for mixture logic
- `picollm/accelerated/chat/sft.py` for the real fine-tuning loop
- `picollm/accelerated/chat/eval.py` for post-SFT task checks
> [!example] Notebook walkthroughs in this lecture
> Use these companion notebook links as you read or review this lecture:
>
> - [notebooks/sft_flow/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/sft_flow/lecture_walkthrough.ipynb)
> - [notebooks/chat_format_and_sft/lecture_walkthrough.ipynb](https://github.com/Montekkundan/llm/blob/main/notebooks/chat_format_and_sft/lecture_walkthrough.ipynb)
<div style="display:flex; gap:1rem; margin:1.5rem 0; flex-wrap:wrap;">
<div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);">
<div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Previous</div>
<div><a class="internal-link" data-href="Chat Format and SFT" href="Chat%20Format%20and%20SFT">Chat Format and SFT</a></div>
</div>
<div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);">
<div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Next</div>
<div><a class="internal-link" data-href="Failure Modes and Debugging" href="Failure%20Modes%20and%20Debugging">Failure Modes and Debugging</a></div>
</div>
</div>
## Further reading
- Long Ouyang et al., "Training language models to follow instructions with human feedback," 2022. https://arxiv.org/abs/2203.02155
- Hugging Face, "Chat templates," 2025. https://huggingface.co/docs/transformers/chat_templating
- Hugging Face, "SFT Trainer," 2025. https://huggingface.co/docs/trl/en/sft_trainer
- Hugo Touvron et al., "Llama 2: Open Foundation and Fine-Tuned Chat Models," 2023. https://arxiv.org/abs/2307.09288