Streaming UX: Building Real-Time AI Interfaces That Feel Alive

TLDR: Time to first token beats total generation time — get something on screen in 800ms and the wait feels fast. SSE, Angular signals, the blinking cursor pattern, streaming markdown buffering, and a shipping checklist. One rule I won't compromise on: never throw away partial content on error.

A spinner for 8 seconds is a dead UI. The same 8 seconds with tokens streaming in — words appearing as the model generates them — feels fast. This isn't a perception trick. It's a fundamental shift in how users experience latency.

I've shipped three AI chat interfaces and the streaming UX work is consistently what separates the ones that feel polished from the ones that feel clunky. Here's everything I've learned about doing it right.

Why streaming changes the experience

LLMs generate tokens sequentially. Without streaming, the user waits for the entire response before seeing anything. With streaming:

Time to first token (TTFT) drops to ~300–800ms
Users can start reading while the model is still writing
Users can interrupt early if the response is going the wrong direction
The interface feels alive, not like a broken loading state

Users perceive streamed responses as roughly 3× faster even when total generation time is identical.

The number that matters is TTFT, not total generation time. Get something on screen in under 800ms and the experience feels responsive regardless of how long the full response takes.

The Server-Sent Events pattern

The standard transport for LLM streaming is SSE — a unidirectional HTTP stream from server to client. It's simpler than WebSockets for this use case because you only need one direction.

Backend (Node.js / Express)

import Anthropic from "@anthropic-ai/sdk";
import express from "express";

const client = new Anthropic();
const app = express();
app.use(express.json());

app.post("/api/chat", async (req, res) => {
  const { messages } = req.body;

  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");

  const stream = await client.messages.stream({
    model: "claude-sonnet-4-6",
    max_tokens: 2048,
    messages,
  });

  for await (const chunk of stream) {
    if (
      chunk.type === "content_block_delta" &&
      chunk.delta.type === "text_delta"
    ) {
      res.write(`data: ${JSON.stringify({ text: chunk.delta.text })}\n\n`);
    }
  }

  res.write("data: [DONE]\n\n");
  res.end();
});

Frontend (React)

async function streamChat(messages: Message[], onChunk: (text: string) => void) {
  const response = await fetch("/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ messages }),
  });

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const lines = decoder.decode(value).split("\n");
    for (const line of lines) {
      if (!line.startsWith("data: ")) continue;
      const data = line.slice(6);
      if (data === "[DONE]") return;

      const { text } = JSON.parse(data);
      onChunk(text);
    }
  }
}

Angular implementation with signals

Angular signals pair naturally with streaming — each token is a fine-grained reactive update, not a full re-render trigger.

@Component({
  selector: "app-chat",
  template: `
    <div class="message-stream">
      {{ streamingContent() }}
      @if (isStreaming()) {
        <span class="cursor">▋</span>
      }
    </div>
    <button (click)="stopStream()" [disabled]="!isStreaming()">Stop</button>
  `,
})
export class ChatComponent {
  streamingContent = signal("");
  isStreaming = signal(false);
  private abortController?: AbortController;

  async sendMessage(userMessage: string) {
    this.streamingContent.set("");
    this.isStreaming.set(true);
    this.abortController = new AbortController();

    try {
      const response = await fetch("/api/chat", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ messages: [{ role: "user", content: userMessage }] }),
        signal: this.abortController.signal,
      });

      const reader = response.body!.getReader();
      const decoder = new TextDecoder();

      while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const lines = decoder.decode(value).split("\n");
        for (const line of lines) {
          if (!line.startsWith("data: ")) continue;
          const data = line.slice(6);
          if (data === "[DONE]") break;

          const { text } = JSON.parse(data);
          // Signal update — only this span re-renders, nothing else
          this.streamingContent.update((prev) => prev + text);
        }
      }
    } finally {
      this.isStreaming.set(false);
    }
  }

  stopStream() {
    this.abortController?.abort();
  }
}

The key: signal.update() on every token triggers a targeted micro-update, not a full component re-render. On a fast model this can be 30+ updates per second. Signals handle this without any visible jank.

The cursor pattern — one detail that changes everything

Users expect a blinking cursor during generation. It signals "still thinking" and makes the streaming feel intentional rather than broken. This one detail dramatically improves perceived quality.

@keyframes blink {
  0%, 100% { opacity: 1; }
  50% { opacity: 0; }
}

.cursor {
  display: inline-block;
  width: 2px;
  height: 1.1em;
  background: currentColor;
  margin-left: 1px;
  animation: blink 1s step-start infinite;
  vertical-align: text-bottom;
}

Show it when isStreaming is true, hide it on completion. Add a short fade-out transition on hide so the cursor disappearing doesn't feel abrupt.

Streaming structured data

Plain text is easy. The challenge is when you need to stream structured output — partial JSON, markdown with headings, or a mix of text and tool calls.

Streaming Markdown

Render markdown incrementally using a library that handles partial input gracefully. The naive approach flickers as partial markdown is parsed:

import { marked } from "marked";

// Naive: flickers on partial markdown
const html = marked(partialMarkdown);

// Better: buffer until a natural break point
function shouldFlush(buffer: string): boolean {
  return buffer.includes("\n\n") || buffer.endsWith("```\n") || /[.!?]\s*$/.test(buffer);
}

Buffer tokens until you hit a paragraph break, closing code fence, or end of sentence before re-rendering. Users don't notice the slight delay, and you avoid the flickering partial-parse artifacts.

Streaming JSON

Buffer until valid JSON is parseable:

let jsonBuffer = "";

onChunk((text) => {
  jsonBuffer += text;
  try {
    const parsed = JSON.parse(jsonBuffer);
    updateUI(parsed);
  } catch {
    // Still accumulating — do nothing
  }
});

Error handling and interruption

Always preserve partial content on error. The worst streaming UX I've seen throws away the entire half-generated response when something goes wrong — the user just sees the input go blank. Never do this.

async function streamWithErrorHandling(...) {
  try {
    await streamChat(messages, onChunk);
  } catch (error) {
    if (error.name === "AbortError") {
      // User cancelled — show partial response with indicator
      appendToMessage("\n\n*[Stopped]*");
    } else if (error.status === 429) {
      showRetryAfter(error.headers["retry-after"]);
    } else {
      showError("Something went wrong. Partial response above.");
    }
  }
}

Performance: don't re-render on every token

The most common streaming performance mistake: updating state in a way that re-renders the full component tree on every token.

// ❌ Triggers expensive re-renders on every character
const [content, setContent] = useState("");
onChunk((text) => setContent((prev) => prev + text));

// ✅ Direct DOM mutation for the stream target — zero React overhead
const streamRef = useRef<HTMLSpanElement>(null);
onChunk((text) => {
  if (streamRef.current) {
    streamRef.current.textContent += text;
  }
});

Or use Angular signals as shown earlier — they target the exact DOM node without triggering parent change detection.

UX checklist before shipping

I run through these before every streamed AI feature ships:

Cursor visible during streaming, hidden on completion
Stop button — users must be able to interrupt generation
Partial response preserved if stopped or errored
Auto-scroll follows the stream (but stops if user scrolled up)
Loading state for TTFT — show something in the first 300ms before tokens arrive
Mobile tested — SSE connections behave differently on flaky mobile networks
Reconnect logic — implement exponential backoff for dropped SSE connections

The stop button is non-negotiable. If a model starts going in the wrong direction, users need an escape. An AI interface without an interrupt mechanism is frustrating in a way that kills trust fast.

The Frontend Engineer's Honest Guide to Gen AI

From skeptic to daily user — an honest take on how Gen AI actually shows up in frontend work without the hype. What LLMs really are, what they're good at, and what frontend devs still need to own.

11 min read ·May 17, 2026

Read

AIDesign SystemsFrontend

Intermediate

Claude Code: Using AI to Build and Manage Your Design Token System

Design tokens are tedious to name, hard to keep consistent, and painful to scale across themes. Here's how to use Claude to generate a full two-tier token system, map dark mode, audit naming drift, and convert between formats — with practical prompts you can use today.

11 min read ·May 14, 2026

Read

AIEngineeringFrontend

Beginner

Vibe Coding: AI-First Development Is Reshaping Frontend Engineering

You describe a feature in plain English and get working JSX, typed services, and passing tests. Vibe coding is not a gimmick — here's the workflow, the tools, and what frontend devs must still own.

10 min read ·May 14, 2026

Read

Back to all posts

Why streaming changes the experience

The Server-Sent Events pattern

Backend (Node.js / Express)

Frontend (React)

Angular implementation with signals

The cursor pattern — one detail that changes everything

Streaming structured data

Streaming Markdown

Streaming JSON

Error handling and interruption

Performance: don't re-render on every token

UX checklist before shipping

The Frontend Engineer's Honest Guide to Gen AI

Claude Code: Using AI to Build and Manage Your Design Token System

Vibe Coding: AI-First Development Is Reshaping Frontend Engineering

Stay in the loop