AI engineering b7 2025

LLM Systems That Don't Feel Stitched On

Principles I follow to integrate LLMs into products so they feel native, reliable, and worth keeping.

The difference between a "demo" LLM feature and a real product is reliability. Users should not feel like they are talking to a random model bolted on at the last minute.

Define where AI is allowed to fail

I draw a clear line between flows that can tolerate creative failure (brainstorming, drafting) and flows that cannot (payments, permissions, critical data). For the latter, LLMs act as assistants to deterministic systems, not decision makers.

Measure instead of guessing

I log model inputs, outputs, and tool calls (with redaction) to build feedback loops. This makes it possible to debug bad answers and iterate on prompts, retrieval strategies, and guardrails with real data instead of intuition.