Design
Design: AI Chat Agent for the SPA
Companion to PRD.md, CONTEXT.md, and ADR 0001. The PRD covers what we are building and why. This document covers how.
Domain terms ([[Term]]) are defined in CONTEXT.md.
1. Goals & non-goals
Goals
- Let users accomplish goals in the SPA by chatting with an AI agent that can navigate the site and propose backend API calls on their behalf.
- Preserve the existing security model: the user’s access token never leaves the browser; the LLM is never in the authorization trust path.
- Make the agent maintainable: the catalog of API operations comes from Smithy (no schema drift), and procedural knowledge ([[Runbook|runbooks]]) is reviewed via PR.
Non-goals (v1) — see PRD §“Out of Scope” for the full list. Key omissions: cross-device transcript persistence, per-user rate limits and cost budgets, product/policy Q&A, macro-tools, model-evaluation suite.
2. System architecture
flowchart LR
User([User])
SPA[React SPA<br/>· Chat UI<br/>· Browser Turn Orchestrator<br/>· Proposal Executor<br/>· Tool Registry<br/>· Navigation Intent Handler]
Router[Router Service<br/>route-based authn/authz]
Chat[Chat Service Lambda<br/>· Agent Loop<br/>· System Prompt Builder<br/>· Tool Manifest Validator<br/>· Runbook Retriever]
BackendAPI[Existing Backend APIs<br/>Cognito-authorized]
Bedrock[Bedrock Converse<br/>Claude Haiku 4.5]
KB[Bedrock Knowledge Base<br/>Runbooks]
S3[(S3 — runbook source)]
CW[CloudWatch<br/>logs · metrics · X-Ray]
User <-->|chat panel| SPA
SPA -->|access token| BackendAPI
SPA -->|"/chat/turn (Cognito auth)<br/>{transcript, userMessage?, toolResults?}"| Router
Router --> Chat
Chat -->|ConverseStream| Bedrock
Chat -->|Retrieve| KB
Chat -.->|"structured logs<br/>shape only, no plaintext"| CW
S3 -.->|ingestion job| KB
classDef external fill:#eee,stroke:#999;
class User,Bedrock,KB,S3,CW,BackendAPI,Router external;
The flow of authority: the user’s access token lives only in the SPA and is attached only to direct calls to the existing Backend APIs. The chat service never sees it. The LLM never sees it. See ADR 0001.
3. Components
3.1 SPA components
| Component | Responsibility | Notes |
|---|---|---|
| Chat UI | Message list, streaming text renderer, input box, interleaved approval cards and navigation toasts. | Stateless visual layer. |
| Browser Turn Orchestrator | State machine driving the turn cycle: send turn → render reply → collect approvals → execute → send results → next turn. Holds the in-memory transcript. | Deep, tested as a reducer. |
| Proposal Executor | Validates [[Tool-Call Proposal]] args against the [[Tool Manifest]], dispatches via [[Tool Registry (Browser)]], applies response projection / 4KB truncation, normalizes results. | Deep, tested with mocked SDK. |
| Tool Registry (Browser) | Static toolName → (args) => existingSdk.foo.bar(args) map. | Shallow, data. The seam that lets us reuse the SDK rather than build a parallel HTTP client. |
| Approval Card UI | Per-proposal card. Renders tool name, args (IDs prominent), risk class, approve/decline. Destructive variants add a confirmation gate. | Shallow UI; design-reviewed (issue 08). |
| Navigation Intent Handler | Auto-executes [[Navigation Intent | navigation intents]]: SPA router push + “Taking you to …” toast with undo. |
3.2 Chat Service components
| Component | Responsibility | Notes |
|---|---|---|
| Conversation Turn Handler | Lambda entry. Parses request, invokes Agent Loop, streams response via Lambda response streaming. | Shallow, covered transitively by Agent Loop tests. |
| Agent Loop | Calls Bedrock Converse with the composed system prompt + tool definitions + transcript. Parses the model response into { assistantText, proposals, navigationIntents }. On tool-result feedback, continues the loop. | Deep, tested with mocked Bedrock. |
| System Prompt Builder | Composes system prompt: agent role + navigation/scope rules + always-on runbook title index + per-turn user identity context. | Deep, pure. Snapshot-tested. |
| Tool Manifest Validator | Verifies any tool-use block the model emits references an allowlist entry and the args conform. Rejects malformed proposals before they reach the browser. | Sanity, not authz. Deep, table-driven tests. |
| Runbook Retriever | Wraps Bedrock KB Retrieve for the lookupRunbook tool. | Shallow. |
| Bedrock Client Wrapper | Thin ConverseStream wrapper with error normalization. | Shallow. |
3.3 Build pipeline
| Component | Responsibility | Notes |
|---|---|---|
| Smithy → Manifest Generator | Build-time tool: consumes Smithy model + allowlist + description overrides, emits the [[Tool Manifest]] consumed by both the chat service and the SPA. Validates allowlist entries exist in Smithy and have descriptions + risk classes. | Deep, fixture-tested. |
| Runbook sync CI | On merge: uploads /runbooks/*.md to S3, triggers Bedrock KB ingestion. Validates frontmatter and tools-referenced. | Shallow. |
3.4 Module dependency map
flowchart TB
subgraph SPA
ChatUI[Chat UI]
Orch[Browser Turn Orchestrator]
ApprovalCard[Approval Card UI]
NavHandler[Navigation Intent Handler]
Executor[Proposal Executor]
Registry[Tool Registry]
SDK[Existing Frontend SDK]
end
subgraph ChatSvc[Chat Service]
Handler[Conversation Turn Handler]
Loop[Agent Loop]
Prompt[System Prompt Builder]
Validator[Tool Manifest Validator]
Retriever[Runbook Retriever]
BedrockClient[Bedrock Client Wrapper]
end
Manifest[(Tool Manifest<br/>build-time JSON)]
ChatUI --> Orch
Orch --> ApprovalCard
Orch --> NavHandler
Orch --> Executor
Executor --> Registry
Executor --> Manifest
Registry --> SDK
Handler --> Loop
Loop --> Prompt
Loop --> Validator
Loop --> BedrockClient
Loop --> Retriever
Validator --> Manifest
Prompt --> Manifest
4. Sequence flows
4.1 Text-only turn (no tools)
sequenceDiagram
actor U as User
participant S as SPA<br/>(Orchestrator)
participant R as Router
participant C as Chat Service
participant B as Bedrock
U->>S: types message
S->>S: append to transcript
S->>R: POST /chat/turn<br/>{transcript, userMessage}
R->>R: authn/authz
R->>C: forward + identity
C->>C: build system prompt
C->>B: ConverseStream(system, messages, tools)
B-->>C: stream text tokens
C-->>S: stream {assistantText}
S-->>U: render streamed tokens
Note over S: append assistant turn to transcript
4.2 Turn with a single tool call (the core HITL flow)
sequenceDiagram
actor U as User
participant S as SPA<br/>(Orchestrator)
participant E as Proposal<br/>Executor
participant SDK as Frontend SDK
participant API as Backend API
participant C as Chat Service
participant B as Bedrock
U->>S: "what plan am I on?"
S->>C: POST /chat/turn {transcript, userMessage}
C->>B: ConverseStream(..., tools=[manifest])
B-->>C: tool_use{tool=getCurrentUser, args={}}
C->>C: validate against manifest
C-->>S: {proposals=[{id, tool, args, riskClass}]}
S->>S: render Approval Card
U->>S: clicks Approve
S->>E: execute(proposal)
E->>E: validate args vs manifest
E->>SDK: sdk.user.getCurrent()
SDK->>API: GET /me (Bearer access_token)
API-->>SDK: 200 {name, tier, ...}
SDK-->>E: response
E->>E: project/truncate per manifest
E-->>S: {id, status:"ok", body}
S->>C: POST /chat/turn {transcript, toolResults=[...]}
C->>B: ConverseStream(..., tool_result fed back)
B-->>C: stream text "You're on the Pro plan…"
C-->>S: stream {assistantText}
S-->>U: render reply
The “suspend / resume” between the proposal emission and the tool-result feedback is just two HTTP calls, not anything exotic. The chat service is stateless; the browser holds the transcript and replays it each turn.
4.3 Turn with a decline
sequenceDiagram
actor U as User
participant S as SPA<br/>(Orchestrator)
participant C as Chat Service
participant B as Bedrock
U->>S: "cancel my subscription"
S->>C: POST /chat/turn
C->>B: ConverseStream
B-->>C: tool_use{tool=requestCancellation, args={...}}
C-->>S: proposals=[{... riskClass:"destructive"}]
S->>S: render Approval Card (destructive variant, gate)
U->>S: clicks Decline
S->>C: POST /chat/turn {toolResults=[{status:"declined"}]}
C->>B: ConverseStream (sees declined tool_result)
B-->>C: stream text "OK — I won't proceed. Want me to…?"
C-->>S: stream assistantText
S-->>U: render acknowledgement
No silent retry. The model is prompted by the system rules to acknowledge and suggest alternatives or stop — never re-emit the same proposal unprompted.
4.4 Turn with a runbook lookup + multi-step plan
sequenceDiagram
actor U as User
participant S as SPA
participant C as Chat Service
participant B as Bedrock
participant KB as Bedrock KB
U->>S: "cancel my subscription"
S->>C: POST /chat/turn
C->>B: ConverseStream
B-->>C: tool_use{tool=lookupRunbook, query="cancel subscription"}
C->>KB: Retrieve(query)
KB-->>C: runbook chunks
C->>B: ConverseStream (runbook fed back as tool_result)
B-->>C: text + tool_use{tool=getSubscription, args={}}
C-->>S: assistantText + proposal
Note over U,S: ... user approves, browser executes,<br/>result fed back ...
Note over B: agent chains next step from runbook
B-->>C: tool_use{tool=requestCancellation, args=...}
Note over U,S: ... approval + execute ...
B-->>C: tool_use{tool=getCancellationStatus, args=...}
Note over U,S: ... approval + execute ...
B-->>C: stream text "Done — your subscription is cancelled."
lookupRunbook is server-executed (no token needed, no side effect) so
no per-call approval. Every subsequent API call is browser-executed and
individually approved.
4.5 Turn with a navigation intent (no tools)
sequenceDiagram
actor U as User
participant S as SPA
participant C as Chat Service
participant B as Bedrock
U->>S: "take me to billing"
S->>C: POST /chat/turn
C->>B: ConverseStream
B-->>C: text "Heading to billing…" +<br/>tool_use{tool=navigate, args={url:"/billing"}}
C-->>S: {assistantText, navigationIntents=[{url}]}
S-->>U: render text, then toast "Taking you to /billing"
S->>S: router.push("/billing")
Navigation intents auto-execute (no approval) because they mutate only client state and require no access token.
4.6 Turn with an out-of-scope question
sequenceDiagram
actor U as User
participant S as SPA
participant C as Chat Service
participant B as Bedrock
U->>S: "what's your refund policy?"
S->>C: POST /chat/turn
C->>B: ConverseStream (system prompt: "decline policy Qs, navigate")
B-->>C: text "Our refund details live on the help center —<br/>let me take you there." +<br/>tool_use{tool=navigate, args={url:"/contact"}}
C-->>S: {assistantText, navigationIntents=[{url:"/contact"}]}
S-->>U: render decline + toast + navigation
5. Browser turn orchestrator state machine
stateDiagram-v2
[*] --> Idle
Idle --> AwaitingTurn: userTyped / send
AwaitingTurn --> Streaming: first chunk received
Streaming --> Streaming: more tokens
Streaming --> AwaitingApprovals: proposals received
Streaming --> ExecutingNavigations: navigationIntents received
Streaming --> Idle: turnComplete, no proposals
ExecutingNavigations --> Idle: navigation fired
AwaitingApprovals --> Executing: user approved at least one
AwaitingApprovals --> AwaitingTurn: all declined → send declines
Executing --> Executing: more proposals pending
Executing --> AwaitingTurn: all results gathered → send next turn
AwaitingTurn --> Error: network / 5xx
Streaming --> Error: stream aborted
Error --> Idle: user dismisses
States are tested as reducer transitions (deep module). The state machine is what makes the orchestrator straightforward to test in isolation — no DOM, no network, just state + events.
6. Data shapes (contract surface)
These are the contracts between subsystems. Field names are stable; exact serialization is an implementation detail.
6.1 Browser ⇄ Chat Service
Request (POST /chat/turn)
{
transcript: Message[] // full prior conversation
userMessage?: string // if user just typed
toolResults?: ToolResult[] // if browser just executed proposals
}
Response (streamed)
{
assistantText: string // streamed in chunks
proposals: Proposal[] // emitted whole, after text
navigationIntents: NavigationIntent[]
}
6.2 Proposal & ToolResult
Proposal {
id: string // unique per turn
tool: string // allowlist entry name
args: object // conforms to manifest argSchema
riskClass: "read" | "write" | "destructive"
}
ToolResult {
id: string // matches Proposal.id
status: "ok" | "error" | "declined"
body?: unknown // projected/truncated tool response
error?: { kind: "client" | "server" | "network", message: string, statusCode?: number }
}
6.3 NavigationIntent
NavigationIntent {
url: string // internal path only
}
6.4 Tool Manifest (build-time JSON)
ToolManifestEntry {
name: string
description: string // LLM-tuned
riskClass: "read" | "write" | "destructive"
argSchema: JSONSchema
responseProjection?: PathSelector[] // applied before truncation
maxResponseBytes?: number // default 4096
}
6.5 Runbook frontmatter
---
name: string # kebab-case slug, unique
title: string # human-readable
tools-referenced: string[] # must exist in allowlist
tags: string[]
last-reviewed: YYYY-MM-DD
---
<body — short, LLM-tuned procedural prose>
7. Build & sync pipelines
7.1 Tool Manifest generation (build time)
flowchart LR
Smithy[(Smithy model)]
Allow[(allowlist.json)]
Desc[(descriptions.json<br/>LLM-tuned overrides)]
Gen[Smithy → Manifest<br/>Generator]
Manifest[(tool-manifest.json)]
SPA[SPA bundle]
Chat[Chat Service bundle]
Smithy --> Gen
Allow --> Gen
Desc --> Gen
Gen -->|validate: allowlisted ops exist,<br/>descriptions present, risk classes present| Manifest
Manifest --> SPA
Manifest --> Chat
Single source of truth for the agent’s catalog. Build fails if validation fails — no silent drift.
7.2 Runbook KB sync (on merge to main)
flowchart LR
Repo[/runbooks/*.md/]
CI[CI step]
S3[(S3 prefix)]
Ingest[KB ingestion job]
KB[Bedrock KB]
Repo -->|push to main| CI
CI -->|validate frontmatter +<br/>tools-referenced ⊂ allowlist| CI
CI -->|upload changed files| S3
CI -->|start-ingestion-job| Ingest
Ingest --> KB
KB ingestion takes minutes. Runbook updates are not real-time; the authoring tempo (PR → review → merge → ingest) is acceptable for v1.
8. Security model
The full reasoning is in ADR 0001. Summary:
- The access token never leaves the browser. It is attached only when the SPA’s Proposal Executor invokes the existing frontend SDK, exactly as the SPA does for any user-initiated click. The chat service does not receive it; Bedrock does not receive it.
- The agent is a UI, not a trust boundary. The backend API’s existing per-call authz checks are the sole authorization gate. Anything the user can do via clicks, they can do via the agent; anything they can’t do via clicks, the API will reject when the agent tries.
- HITL is defense-in-depth against social engineering, not against privilege escalation. The user can be tricked (e.g., via a prompt-injected tool result) into approving something they could legitimately do but shouldn’t. The approval card UX is the mitigation: IDs prominent, risk class shown, destructive operations gated.
- No server-side re-validation of proposals against user permissions. Duplicating the API’s checks would be a parallel authz system that can drift — a worse posture than relying on the API.
8.1 What the LLM sees vs doesn’t see
flowchart LR
subgraph llm[LLM sees]
sys[System prompt]
trans[Transcript]
toolDefs[Tool definitions<br/>names + arg schemas + descriptions]
toolRes[Tool result bodies<br/>projected/truncated]
runbook[Retrieved runbook chunks]
end
subgraph never[LLM never sees]
token[Access token]
rawResp[Raw unprojected API responses<br/>over 4KB]
otherUser[Any other user's data]
end
classDef good fill:#dfe;
classDef bad fill:#fdd;
class llm good;
class never bad;
9. Failure modes & their handling
| Failure | Detection | Handling |
|---|---|---|
| Model emits a tool name not in allowlist | Tool Manifest Validator (server) | Reject before sending to browser; log; the Agent Loop tells the model “that tool doesn’t exist” and continues. |
| Model emits args that violate the manifest schema | Tool Manifest Validator (server) | Same as above. |
| User declines a proposal | Browser Orchestrator | Send {status:"declined"} back; model is prompted to acknowledge and suggest alternatives or stop. |
| Backend API returns 4xx | Proposal Executor | Normalize to {status:"error", error:{kind:"client", ...}}; fed back; model may propose differently. No auto-retry. |
| Backend API returns 5xx | Proposal Executor | Same as 4xx with kind:"server". No auto-retry. |
| Network/timeout in browser | Proposal Executor | Surface in chat UI with “try again” affordance; user must re-approve to retry. |
| Tool response exceeds 4KB | Proposal Executor | Apply per-tool responseProjection first; if still over, truncate with …truncated, N more bytes note. |
| Bedrock returns an error | Bedrock Client Wrapper | Surface to user as “something went wrong with the assistant”; no auto-retry; turn ends. |
| Lambda response stream interrupted | Browser Orchestrator | Mark partial turn as failed; user can resend. |
| Runbook KB retrieval fails | Runbook Retriever | Return empty result + log; model is told “no runbook found” and proceeds without one. |
| Conversation grows past comfortable token count | (Logged, not enforced in v1) | No action in v1. Watch logs; revisit if real conversations exceed ~50K tokens. |
10. Deployment
flowchart LR
subgraph aws[AWS]
CF[CloudFront]
FU[Lambda Function URL<br/>response streaming]
L[Chat Service Lambda]
BR[Bedrock Converse]
KB[Bedrock KB]
S3[(S3 — runbook source)]
CW[CloudWatch]
IAM[IAM execution role]
end
SPA[React SPA<br/>existing app]
Router[Existing Router Service]
SPA --> Router
Router --> CF
CF --> FU
FU --> L
L --> BR
L --> KB
L -.-> CW
S3 -.-> KB
L -.- IAM
- Lambda with response streaming via Function URL, fronted by CloudFront for caching headers and a stable hostname. The existing Router Service forwards authenticated traffic to CloudFront.
- IAM: Lambda role allows
bedrock:InvokeModelWithResponseStream,bedrock-agent-runtime:Retrieve, and CloudWatch Logs writes. No other AWS API access. No access to user data stores — by construction, the chat service does not need them. - CDK app is the source of truth for all of the above; one stack per environment.
11. Open trade-offs and known risks
| Trade-off | Decision | Future revisit signal |
|---|---|---|
| Browser-held transcript | Accepted; ships v1 fast | Users ask for cross-device resume; or support requests transcript access |
| No rate limits / cost budgets | Accepted; bounded risk via Cognito-only access | First abuse incident; or any non-trivial public rollout |
| No transcript persistence | Accepted | Support needs to review user sessions; compliance requires retention |
| Single model (Haiku 4.5) | Cost-first start | Quality complaints on multi-step tasks → escalate to Sonnet 4.6 |
| No model-evals | Accepted; small v1 surface area | Runbook count grows past ~20 or retrieval quality drops |
| No macro-tools | Accepted; primitive chaining + per-call HITL | Approval fatigue measured in user research |
| No product-docs KB | Accepted; navigate-and-handoff | If a significant fraction of user queries are declined, build the KB |
| Out-of-scope = decline-and-navigate | Accepted | Liability cost of one wrong answer ≫ inconvenience of navigation |
Each of these has a tracked deferral in PRD §“Out of Scope.” None are oversights.