AI / Orchestration / Operations

Eight Heuristics for Generative and Agentic AI Products

Person in the office looking at the computer screen
Pavel Bukengolts
0:00

Generative and agentic systems break in ways that classic UX frameworks were not built to catch. A hallucinated citation can look identical to a sourced answer. An agent can take an irreversible action that the user never saw coming. A screen reader can lose its place every time the model finishes a sentence. Nielsen's heuristics, Microsoft's HAX guidance, and Google's PAIR guidebook still matter. They are just not enough on their own.

Timeline diagram showing major UX, accessibility, and human-AI frameworks that informed UX Design Lab’s Eight Heuristics (2026). It lists Jakob Nielsen’s 10 Usability Heuristics (1994), Jill Gerhardt-Powals’ Cognitive Engineering Principles (1996), Eric Horvitz’s Principles of Mixed-Initiative User Interfaces (1999), WCAG 1.0 (1999), DARPA Explainable AI (2016), Google’s People + AI Guidebook (2019), Microsoft’s Guidelines for Human-AI Interaction (HAX) (2019), OECD AI Principles (2019), and NIST AI RMF 1.0 (2023), all converging into UX Design Lab Eight Heuristics, 2026.
Timeline of UX, accessibility, and human-AI frameworks from 1994 to 2023 converging into UX Design Lab’s Eight Heuristics, 2026.

At UX Design Lab, we took what those frameworks established and extended them into something evaluable for the systems being built right now. Eight heuristics, each with a definition, a failure mode, a test you can actually run, and a pass/fail threshold. The mechanics that make them different from prior guidance include:

  • Scaling preview friction to the blast radius of an action, so high-impact agent behavior gets a full abort path and low-impact actions run quietly — without training users to click through without reading
  • Measuring the actual time and quality cost of redirecting an AI mid-task, rather than treating correction as equivalent to starting over
  • Testing dynamic accessibility under live streaming and async updates, not just against a static DOM scan that will miss every failure that matters in a generative interface
  • A governance companion layer that crosswalks compliance requirements back to specific heuristics, so legal risk doesn't get siloed away from the UX review

There is also a known soft spot we are actively working on: the tone evaluation in Heuristic 7 is intentionally qualitative, and the latency thresholds in Heuristic 8 are tiered to human perception but still interpretive. We named them in the document rather than hiding them. That is where the most useful pushback will come from.

This is v0.25, a working draft. The PDF gives the short version. The full document is linked below, view-only, no form.

If you use it on a real product and find a weak spot, a missing failure mode, or a place where it breaks under pressure, reach out. We want to build a stronger version with people who are actually doing the work.


Read the full framework in the Google Doc: Eight Heuristics for Generative and Agentic AI Products
Download the short PDF version: PDF

Heuristic 1-Legible Capability and Confidence
1-Legible Capability and Confidence
Heuristic 2-Grounded Provenance
2-Grounded Provenance
Heuristic 3-Bounded Autonomy
3-Bounded Autonomy
Heuristic 4-Intervention and Recovery
4-Intervention and Recovery
Heuristic 5-Accessibility Integrity Under Change
5-Accessibility Integrity Under Change
Heuristic 6-Progressive Disclosure of Inspectability
6-Progressive Disclosure of Inspectability
Heuristic 7-Transparent Safety Boundaries
7-Transparent Safety Boundaries
Heuristic 8-Process Visibility and Timing
8-Process Visibility and Timing

CONTINUE THE CONVERSATION

Start with the real problem.

Bring the situation. We'll figure out where to start.