> [!info] Course code
> Use the companion repository for runnable notebooks, figures, and implementation references for this lecture:
> - [apps/vercel_ai_sdk_chat/README.md](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/README.md)
> - [apps/vercel_ai_sdk_chat/app/api/chat/route.ts](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/app/api/chat/route.ts)
> - [picollm/accelerated/chat/web.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/web.py)
> - [picollm/accelerated/README.md](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/README.md)
## What This Concept Is
Imagine you already have a backend that can serve chat completions. The next step is giving people a browser interface that feels like a real product, not just a raw API. This note explains that browser-to-backend connection.
So the focus here is not the model internals anymore. It is the web path that lets a user type into a chat box and see streamed responses coming back.
## Foundation Terms You Need First
The **browser UI** is the visible chat interface. The **API route** is the server endpoint the UI sends requests to. **[[Glossary#Streaming|Streaming]]** means the reply can arrive piece by piece while generation is still happening. The **backend contract** is the request and response shape expected by both the AI SDK and the model server.
So when you read this note, keep one line in mind: browser UI on one side, model backend on the other, and a stable contract joining them.
```mermaid
flowchart TD
A["Browser chat UI"] --> B["Next.js API route"]
B --> C["AI SDK streamText"]
C --> D["OpenAI-compatible picoLLM backend"]
D --> E["picoLLM checkpoint and engine"]
```
## Course Framing
For this course, the hierarchy is:
- `picollm` is the primary model and backend path
- the Vercel AI SDK app is one product client over that backend
- `rasbt/LLMs-from-scratch` is the concept-first reference for earlier lectures
- `nanochat` is the systems-first external comparison, not the implementation you are meant to build here
It is easy to over-focus on the frontend app and forget that the trained picoLLM backend is the core object.
## The core deployment split
The most important production idea in this lecture is separation of concerns.
- the web app can live on Vercel
- the model usually should not
- the model backend should expose an API
- the frontend should consume that API through a stable client abstraction
That is why this lecture uses the Vercel AI SDK with an OpenAI-compatible provider.
## Why OpenAI-compatible serving matters
If your backend behaves like the OpenAI chat completions API, then many clients can speak to it without knowing your internal model code.
That means:
- you can swap the backend later
- you can keep the frontend code simple
- you can move from local demo to cloud deployment without rewriting the UI
In this course, `picollm` is the backend that exposes that interface.
For this lecture, that means the accelerated backend in `picollm/accelerated/chat/web.py`. The same stack produces the [[Glossary#Checkpoint|checkpoint]] and serves it through the OpenAI-compatible interface the web app expects.
## How this lecture connects back to the earlier theory notes
Use this note to close the loop:
- [[LLM/concepts/Tokenization]] explains why the backend accepts tokenized chat prompts at all.
- [[Decoder Block]] and [[Causal Language Modeling]] explain why the backend still generates one token at a time.
- [[Inference Runtime and KV Cache]] explains why streaming responses are feasible in an interactive app.
- [[FastAPI Chat App]] explains the API surface that the Vercel app is calling.
- [[Real Chatbot Workflow]] explains where the checkpoint came from.
That is the lecture map you need before they start implementing product surfaces.
## The three pieces to show in class
### 1. `picollm` backend
Start from:
- [picollm/accelerated/chat/web.py](https://github.com/Montekkundan/llm/blob/main/picollm/accelerated/chat/web.py)
You should see that this backend exposes:
- `/v1/models`
- `/v1/chat/completions`
That is the contract the frontend will call.
### 2. AI SDK route
Then open:
- [apps/vercel_ai_sdk_chat/app/api/chat/route.ts](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/app/api/chat/route.ts)
This is the production bridge:
- the browser sends messages to the Next.js route
- the route uses `streamText`
- the provider points at `picollm`
- the response streams back into the UI
This is where theory, model serving, and product UI meet.
### 3. Chat UI
Then open:
- [apps/vercel_ai_sdk_chat/app/page.tsx](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/app/page.tsx)
The main supporting UI pieces are:
- [apps/vercel_ai_sdk_chat/app/layout.tsx](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/app/layout.tsx)
- [apps/vercel_ai_sdk_chat/components/settings.tsx](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/components/settings.tsx)
- [apps/vercel_ai_sdk_chat/components/ai-elements/conversation.tsx](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/components/ai-elements/conversation.tsx)
- [apps/vercel_ai_sdk_chat/components/ai-elements/message.tsx](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/components/ai-elements/message.tsx)
- [apps/vercel_ai_sdk_chat/components/ai-elements/prompt-input.tsx](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/components/ai-elements/prompt-input.tsx)
- [apps/vercel_ai_sdk_chat/components/ai-elements/suggestion.tsx](https://github.com/Montekkundan/llm/blob/main/apps/vercel_ai_sdk_chat/components/ai-elements/suggestion.tsx)
This shows that the frontend is not the model. It is a client that:
- renders messages
- sends prompts
- streams tokens
- stores model connection settings
## How it fits the course
Use this sequence:
1. remind you that a model is only useful if something can call it
2. show `picollm` serving a model locally
3. show the AI SDK route that calls the backend
4. show the UI that consumes the streamed response
5. explain that Vercel hosts the app, not the GPU model itself
That is the production mental model to keep.
## Local run flow
Run the model backend first:
If you want to use your own from-scratch chatbot:
```bash
uv run python -m picollm.accelerated.chat.web \
--source sft
```
Then run the frontend app from `apps/vercel_ai_sdk_chat/`:
```bash
npm install
cp .env.example .env.local
npm run dev
```
Then show:
- backend on `http://127.0.0.1:8008`
- frontend on `http://127.0.0.1:3000`
Set:
```bash
PICOLLM_MODEL=picollm-chat
```
## What to learn here
By the end of this lecture, you should be able to explain:
- why the web app and the model backend should usually be separated
- why OpenAI-compatible APIs are useful as a portability layer
- how `streamText` fits between frontend UI and backend model
- why production deployment is often a composition of services, not a single monolith
## Relationship to the rest of the course
This note should be taught after:
- [[Real Chatbot Workflow]]
- [[Deployment]]
- [[FastAPI Chat App]]
It is the final product-facing bridge from:
- theory
- to model
- to serving
- to a deployable web interface
In the final course project, this is the last mile:
- your own model checkpoint
- your own chatbot behavior
- your own backend
- your own browser app
<div style="display:flex; gap:1rem; margin:1.5rem 0; flex-wrap:wrap;">
<div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);">
<div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Previous</div>
<div><a class="internal-link" data-href="Nanochat Architecture" href="Nanochat%20Architecture">Nanochat Architecture</a></div>
</div>
<div style="flex:1; min-width:220px; border:1px solid var(--background-modifier-border); border-radius:12px; padding:1rem; background:var(--background-secondary);">
<div style="font-size:0.85em; color:var(--text-muted); margin-bottom:0.35rem;">Next</div>
<div><a class="internal-link" data-href="Scaling Laws and Compute-Optimal Training" href="Scaling%20Laws%20and%20Compute-Optimal%20Training">Scaling Laws and Compute-Optimal Training</a></div>
</div>
</div>
## Further reading
- Vercel, "AI SDK UI transport," 2025. https://ai-sdk.dev/docs/ai-sdk-ui/transport
- Vercel, "OpenAI-compatible providers," 2025. https://ai-sdk.dev/providers/openai-compatible-providers
- Vercel, "AI Elements," 2025. https://elements.ai-sdk.dev/
- shadcn/ui, "Documentation," 2025. https://ui.shadcn.com/docs