--- title: "Design" description: "" --- # Design: AI Chat Agent for the SPA Companion to [PRD.md](./PRD.md), [CONTEXT.md](./CONTEXT.md), and [ADR 0001](./docs/adr/0001-agent-is-a-ui-not-a-trust-boundary.md). The PRD covers *what* we are building and *why*. This document covers *how*. Domain terms (`[[Term]]`) are defined in CONTEXT.md. --- ## 1. Goals & non-goals **Goals** - Let users accomplish goals in the SPA by chatting with an AI agent that can navigate the site and propose backend API calls on their behalf. - Preserve the existing security model: the user's access token never leaves the browser; the LLM is never in the authorization trust path. - Make the agent maintainable: the catalog of API operations comes from Smithy (no schema drift), and procedural knowledge ([[Runbook|runbooks]]) is reviewed via PR. **Non-goals (v1)** — see PRD §"Out of Scope" for the full list. Key omissions: cross-device transcript persistence, per-user rate limits and cost budgets, product/policy Q&A, macro-tools, model-evaluation suite. --- ## 2. System architecture ```mermaid flowchart LR User([User]) SPA[React SPA
· Chat UI
· Browser Turn Orchestrator
· Proposal Executor
· Tool Registry
· Navigation Intent Handler] Router[Router Service
route-based authn/authz] Chat[Chat Service Lambda
· Agent Loop
· System Prompt Builder
· Tool Manifest Validator
· Runbook Retriever] BackendAPI[Existing Backend APIs
Cognito-authorized] Bedrock[Bedrock Converse
Claude Haiku 4.5] KB[Bedrock Knowledge Base
Runbooks] S3[(S3 — runbook source)] CW[CloudWatch
logs · metrics · X-Ray] User <-->|chat panel| SPA SPA -->|access token| BackendAPI SPA -->|"/chat/turn (Cognito auth)
{transcript, userMessage?, toolResults?}"| Router Router --> Chat Chat -->|ConverseStream| Bedrock Chat -->|Retrieve| KB Chat -.->|"structured logs
shape only, no plaintext"| CW S3 -.->|ingestion job| KB classDef external fill:#eee,stroke:#999; class User,Bedrock,KB,S3,CW,BackendAPI,Router external; ``` **The flow of authority**: the user's access token lives only in the SPA and is attached only to direct calls to the existing Backend APIs. The chat service never sees it. The LLM never sees it. See [ADR 0001](./docs/adr/0001-agent-is-a-ui-not-a-trust-boundary.md). --- ## 3. Components ### 3.1 SPA components | Component | Responsibility | Notes | |---|---|---| | Chat UI | Message list, streaming text renderer, input box, interleaved approval cards and navigation toasts. | Stateless visual layer. | | Browser Turn Orchestrator | State machine driving the turn cycle: send turn → render reply → collect approvals → execute → send results → next turn. Holds the in-memory transcript. | Deep, tested as a reducer. | | Proposal Executor | Validates [[Tool-Call Proposal]] args against the [[Tool Manifest]], dispatches via [[Tool Registry (Browser)]], applies response projection / 4KB truncation, normalizes results. | Deep, tested with mocked SDK. | | Tool Registry (Browser) | Static `toolName → (args) => existingSdk.foo.bar(args)` map. | Shallow, data. The seam that lets us reuse the SDK rather than build a parallel HTTP client. | | Approval Card UI | Per-proposal card. Renders tool name, args (IDs prominent), risk class, approve/decline. Destructive variants add a confirmation gate. | Shallow UI; design-reviewed (issue 08). | | Navigation Intent Handler | Auto-executes [[Navigation Intent|navigation intents]]: SPA router push + "Taking you to …" toast with undo. | Shallow. | ### 3.2 Chat Service components | Component | Responsibility | Notes | |---|---|---| | Conversation Turn Handler | Lambda entry. Parses request, invokes Agent Loop, streams response via Lambda response streaming. | Shallow, covered transitively by Agent Loop tests. | | Agent Loop | Calls Bedrock Converse with the composed system prompt + tool definitions + transcript. Parses the model response into `{ assistantText, proposals, navigationIntents }`. On tool-result feedback, continues the loop. | Deep, tested with mocked Bedrock. | | System Prompt Builder | Composes system prompt: agent role + navigation/scope rules + always-on runbook title index + per-turn user identity context. | Deep, pure. Snapshot-tested. | | Tool Manifest Validator | Verifies any tool-use block the model emits references an allowlist entry and the args conform. Rejects malformed proposals before they reach the browser. | Sanity, not authz. Deep, table-driven tests. | | Runbook Retriever | Wraps Bedrock KB `Retrieve` for the `lookupRunbook` tool. | Shallow. | | Bedrock Client Wrapper | Thin `ConverseStream` wrapper with error normalization. | Shallow. | ### 3.3 Build pipeline | Component | Responsibility | Notes | |---|---|---| | Smithy → Manifest Generator | Build-time tool: consumes Smithy model + allowlist + description overrides, emits the [[Tool Manifest]] consumed by both the chat service and the SPA. Validates allowlist entries exist in Smithy and have descriptions + risk classes. | Deep, fixture-tested. | | Runbook sync CI | On merge: uploads `/runbooks/*.md` to S3, triggers Bedrock KB ingestion. Validates frontmatter and `tools-referenced`. | Shallow. | ### 3.4 Module dependency map ```mermaid flowchart TB subgraph SPA ChatUI[Chat UI] Orch[Browser Turn Orchestrator] ApprovalCard[Approval Card UI] NavHandler[Navigation Intent Handler] Executor[Proposal Executor] Registry[Tool Registry] SDK[Existing Frontend SDK] end subgraph ChatSvc[Chat Service] Handler[Conversation Turn Handler] Loop[Agent Loop] Prompt[System Prompt Builder] Validator[Tool Manifest Validator] Retriever[Runbook Retriever] BedrockClient[Bedrock Client Wrapper] end Manifest[(Tool Manifest
build-time JSON)] ChatUI --> Orch Orch --> ApprovalCard Orch --> NavHandler Orch --> Executor Executor --> Registry Executor --> Manifest Registry --> SDK Handler --> Loop Loop --> Prompt Loop --> Validator Loop --> BedrockClient Loop --> Retriever Validator --> Manifest Prompt --> Manifest ``` --- ## 4. Sequence flows ### 4.1 Text-only turn (no tools) ```mermaid sequenceDiagram actor U as User participant S as SPA
(Orchestrator) participant R as Router participant C as Chat Service participant B as Bedrock U->>S: types message S->>S: append to transcript S->>R: POST /chat/turn
{transcript, userMessage} R->>R: authn/authz R->>C: forward + identity C->>C: build system prompt C->>B: ConverseStream(system, messages, tools) B-->>C: stream text tokens C-->>S: stream {assistantText} S-->>U: render streamed tokens Note over S: append assistant turn to transcript ``` ### 4.2 Turn with a single tool call (the core HITL flow) ```mermaid sequenceDiagram actor U as User participant S as SPA
(Orchestrator) participant E as Proposal
Executor participant SDK as Frontend SDK participant API as Backend API participant C as Chat Service participant B as Bedrock U->>S: "what plan am I on?" S->>C: POST /chat/turn {transcript, userMessage} C->>B: ConverseStream(..., tools=[manifest]) B-->>C: tool_use{tool=getCurrentUser, args={}} C->>C: validate against manifest C-->>S: {proposals=[{id, tool, args, riskClass}]} S->>S: render Approval Card U->>S: clicks Approve S->>E: execute(proposal) E->>E: validate args vs manifest E->>SDK: sdk.user.getCurrent() SDK->>API: GET /me (Bearer access_token) API-->>SDK: 200 {name, tier, ...} SDK-->>E: response E->>E: project/truncate per manifest E-->>S: {id, status:"ok", body} S->>C: POST /chat/turn {transcript, toolResults=[...]} C->>B: ConverseStream(..., tool_result fed back) B-->>C: stream text "You're on the Pro plan…" C-->>S: stream {assistantText} S-->>U: render reply ``` The "suspend / resume" between the proposal emission and the tool-result feedback is **just two HTTP calls**, not anything exotic. The chat service is stateless; the browser holds the transcript and replays it each turn. ### 4.3 Turn with a decline ```mermaid sequenceDiagram actor U as User participant S as SPA
(Orchestrator) participant C as Chat Service participant B as Bedrock U->>S: "cancel my subscription" S->>C: POST /chat/turn C->>B: ConverseStream B-->>C: tool_use{tool=requestCancellation, args={...}} C-->>S: proposals=[{... riskClass:"destructive"}] S->>S: render Approval Card (destructive variant, gate) U->>S: clicks Decline S->>C: POST /chat/turn {toolResults=[{status:"declined"}]} C->>B: ConverseStream (sees declined tool_result) B-->>C: stream text "OK — I won't proceed. Want me to…?" C-->>S: stream assistantText S-->>U: render acknowledgement ``` No silent retry. The model is prompted by the system rules to acknowledge and suggest alternatives or stop — never re-emit the same proposal unprompted. ### 4.4 Turn with a runbook lookup + multi-step plan ```mermaid sequenceDiagram actor U as User participant S as SPA participant C as Chat Service participant B as Bedrock participant KB as Bedrock KB U->>S: "cancel my subscription" S->>C: POST /chat/turn C->>B: ConverseStream B-->>C: tool_use{tool=lookupRunbook, query="cancel subscription"} C->>KB: Retrieve(query) KB-->>C: runbook chunks C->>B: ConverseStream (runbook fed back as tool_result) B-->>C: text + tool_use{tool=getSubscription, args={}} C-->>S: assistantText + proposal Note over U,S: ... user approves, browser executes,
result fed back ... Note over B: agent chains next step from runbook B-->>C: tool_use{tool=requestCancellation, args=...} Note over U,S: ... approval + execute ... B-->>C: tool_use{tool=getCancellationStatus, args=...} Note over U,S: ... approval + execute ... B-->>C: stream text "Done — your subscription is cancelled." ``` `lookupRunbook` is **server-executed** (no token needed, no side effect) so no per-call approval. Every subsequent API call is browser-executed and individually approved. ### 4.5 Turn with a navigation intent (no tools) ```mermaid sequenceDiagram actor U as User participant S as SPA participant C as Chat Service participant B as Bedrock U->>S: "take me to billing" S->>C: POST /chat/turn C->>B: ConverseStream B-->>C: text "Heading to billing…" +
tool_use{tool=navigate, args={url:"/billing"}} C-->>S: {assistantText, navigationIntents=[{url}]} S-->>U: render text, then toast "Taking you to /billing" S->>S: router.push("/billing") ``` Navigation intents auto-execute (no approval) because they mutate only client state and require no access token. ### 4.6 Turn with an out-of-scope question ```mermaid sequenceDiagram actor U as User participant S as SPA participant C as Chat Service participant B as Bedrock U->>S: "what's your refund policy?" S->>C: POST /chat/turn C->>B: ConverseStream (system prompt: "decline policy Qs, navigate") B-->>C: text "Our refund details live on the help center —
let me take you there." +
tool_use{tool=navigate, args={url:"/contact"}} C-->>S: {assistantText, navigationIntents=[{url:"/contact"}]} S-->>U: render decline + toast + navigation ``` --- ## 5. Browser turn orchestrator state machine ```mermaid stateDiagram-v2 [*] --> Idle Idle --> AwaitingTurn: userTyped / send AwaitingTurn --> Streaming: first chunk received Streaming --> Streaming: more tokens Streaming --> AwaitingApprovals: proposals received Streaming --> ExecutingNavigations: navigationIntents received Streaming --> Idle: turnComplete, no proposals ExecutingNavigations --> Idle: navigation fired AwaitingApprovals --> Executing: user approved at least one AwaitingApprovals --> AwaitingTurn: all declined → send declines Executing --> Executing: more proposals pending Executing --> AwaitingTurn: all results gathered → send next turn AwaitingTurn --> Error: network / 5xx Streaming --> Error: stream aborted Error --> Idle: user dismisses ``` States are tested as reducer transitions (deep module). The state machine is what makes the orchestrator straightforward to test in isolation — no DOM, no network, just state + events. --- ## 6. Data shapes (contract surface) These are the contracts between subsystems. Field names are stable; exact serialization is an implementation detail. ### 6.1 Browser ⇄ Chat Service **Request** (`POST /chat/turn`) ``` { transcript: Message[] // full prior conversation userMessage?: string // if user just typed toolResults?: ToolResult[] // if browser just executed proposals } ``` **Response** (streamed) ``` { assistantText: string // streamed in chunks proposals: Proposal[] // emitted whole, after text navigationIntents: NavigationIntent[] } ``` ### 6.2 Proposal & ToolResult ``` Proposal { id: string // unique per turn tool: string // allowlist entry name args: object // conforms to manifest argSchema riskClass: "read" | "write" | "destructive" } ToolResult { id: string // matches Proposal.id status: "ok" | "error" | "declined" body?: unknown // projected/truncated tool response error?: { kind: "client" | "server" | "network", message: string, statusCode?: number } } ``` ### 6.3 NavigationIntent ``` NavigationIntent { url: string // internal path only } ``` ### 6.4 Tool Manifest (build-time JSON) ``` ToolManifestEntry { name: string description: string // LLM-tuned riskClass: "read" | "write" | "destructive" argSchema: JSONSchema responseProjection?: PathSelector[] // applied before truncation maxResponseBytes?: number // default 4096 } ``` ### 6.5 Runbook frontmatter ``` --- name: string # kebab-case slug, unique title: string # human-readable tools-referenced: string[] # must exist in allowlist tags: string[] last-reviewed: YYYY-MM-DD --- ``` --- ## 7. Build & sync pipelines ### 7.1 Tool Manifest generation (build time) ```mermaid flowchart LR Smithy[(Smithy model)] Allow[(allowlist.json)] Desc[(descriptions.json
LLM-tuned overrides)] Gen[Smithy → Manifest
Generator] Manifest[(tool-manifest.json)] SPA[SPA bundle] Chat[Chat Service bundle] Smithy --> Gen Allow --> Gen Desc --> Gen Gen -->|validate: allowlisted ops exist,
descriptions present, risk classes present| Manifest Manifest --> SPA Manifest --> Chat ``` Single source of truth for the agent's catalog. Build fails if validation fails — no silent drift. ### 7.2 Runbook KB sync (on merge to main) ```mermaid flowchart LR Repo[/runbooks/*.md/] CI[CI step] S3[(S3 prefix)] Ingest[KB ingestion job] KB[Bedrock KB] Repo -->|push to main| CI CI -->|validate frontmatter +
tools-referenced ⊂ allowlist| CI CI -->|upload changed files| S3 CI -->|start-ingestion-job| Ingest Ingest --> KB ``` KB ingestion takes minutes. Runbook updates are not real-time; the authoring tempo (PR → review → merge → ingest) is acceptable for v1. --- ## 8. Security model The full reasoning is in [ADR 0001](./docs/adr/0001-agent-is-a-ui-not-a-trust-boundary.md). Summary: - **The access token never leaves the browser.** It is attached only when the SPA's Proposal Executor invokes the existing frontend SDK, exactly as the SPA does for any user-initiated click. The chat service does not receive it; Bedrock does not receive it. - **The agent is a UI, not a trust boundary.** The backend API's existing per-call authz checks are the sole authorization gate. Anything the user can do via clicks, they can do via the agent; anything they can't do via clicks, the API will reject when the agent tries. - **HITL is defense-in-depth against social engineering**, not against privilege escalation. The user can be tricked (e.g., via a prompt-injected tool result) into approving something they could legitimately do but shouldn't. The approval card UX is the mitigation: IDs prominent, risk class shown, destructive operations gated. - **No server-side re-validation of proposals against user permissions.** Duplicating the API's checks would be a parallel authz system that can drift — a *worse* posture than relying on the API. ### 8.1 What the LLM sees vs doesn't see ```mermaid flowchart LR subgraph llm[LLM sees] sys[System prompt] trans[Transcript] toolDefs[Tool definitions
names + arg schemas + descriptions] toolRes[Tool result bodies
projected/truncated] runbook[Retrieved runbook chunks] end subgraph never[LLM never sees] token[Access token] rawResp[Raw unprojected API responses
over 4KB] otherUser[Any other user's data] end classDef good fill:#dfe; classDef bad fill:#fdd; class llm good; class never bad; ``` --- ## 9. Failure modes & their handling | Failure | Detection | Handling | |---|---|---| | Model emits a tool name not in allowlist | Tool Manifest Validator (server) | Reject before sending to browser; log; the Agent Loop tells the model "that tool doesn't exist" and continues. | | Model emits args that violate the manifest schema | Tool Manifest Validator (server) | Same as above. | | User declines a proposal | Browser Orchestrator | Send `{status:"declined"}` back; model is prompted to acknowledge and suggest alternatives or stop. | | Backend API returns 4xx | Proposal Executor | Normalize to `{status:"error", error:{kind:"client", ...}}`; fed back; model may propose differently. **No auto-retry.** | | Backend API returns 5xx | Proposal Executor | Same as 4xx with `kind:"server"`. **No auto-retry.** | | Network/timeout in browser | Proposal Executor | Surface in chat UI with "try again" affordance; user must re-approve to retry. | | Tool response exceeds 4KB | Proposal Executor | Apply per-tool `responseProjection` first; if still over, truncate with `…truncated, N more bytes` note. | | Bedrock returns an error | Bedrock Client Wrapper | Surface to user as "something went wrong with the assistant"; no auto-retry; turn ends. | | Lambda response stream interrupted | Browser Orchestrator | Mark partial turn as failed; user can resend. | | Runbook KB retrieval fails | Runbook Retriever | Return empty result + log; model is told "no runbook found" and proceeds without one. | | Conversation grows past comfortable token count | (Logged, not enforced in v1) | No action in v1. Watch logs; revisit if real conversations exceed ~50K tokens. | --- ## 10. Deployment ```mermaid flowchart LR subgraph aws[AWS] CF[CloudFront] FU[Lambda Function URL
response streaming] L[Chat Service Lambda] BR[Bedrock Converse] KB[Bedrock KB] S3[(S3 — runbook source)] CW[CloudWatch] IAM[IAM execution role] end SPA[React SPA
existing app] Router[Existing Router Service] SPA --> Router Router --> CF CF --> FU FU --> L L --> BR L --> KB L -.-> CW S3 -.-> KB L -.- IAM ``` - **Lambda** with response streaming via Function URL, fronted by CloudFront for caching headers and a stable hostname. The existing Router Service forwards authenticated traffic to CloudFront. - **IAM**: Lambda role allows `bedrock:InvokeModelWithResponseStream`, `bedrock-agent-runtime:Retrieve`, and CloudWatch Logs writes. No other AWS API access. No access to user data stores — by construction, the chat service does not need them. - **CDK app** is the source of truth for all of the above; one stack per environment. --- ## 11. Open trade-offs and known risks | Trade-off | Decision | Future revisit signal | |---|---|---| | Browser-held transcript | Accepted; ships v1 fast | Users ask for cross-device resume; or support requests transcript access | | No rate limits / cost budgets | Accepted; bounded risk via Cognito-only access | First abuse incident; or any non-trivial public rollout | | No transcript persistence | Accepted | Support needs to review user sessions; compliance requires retention | | Single model (Haiku 4.5) | Cost-first start | Quality complaints on multi-step tasks → escalate to Sonnet 4.6 | | No model-evals | Accepted; small v1 surface area | Runbook count grows past ~20 or retrieval quality drops | | No macro-tools | Accepted; primitive chaining + per-call HITL | Approval fatigue measured in user research | | No product-docs KB | Accepted; navigate-and-handoff | If a significant fraction of user queries are declined, build the KB | | Out-of-scope = decline-and-navigate | Accepted | Liability cost of one wrong answer ≫ inconvenience of navigation | Each of these has a tracked deferral in PRD §"Out of Scope." None are oversights.