Design: AI Chat Agent for the SPA

Companion to PRD.md, CONTEXT.md, and ADR 0001. The PRD covers what we are building and why. This document covers how.

Domain terms ([[Term]]) are defined in CONTEXT.md.

1. Goals & non-goals

Goals

Let users accomplish goals in the SPA by chatting with an AI agent that can navigate the site and propose backend API calls on their behalf.
Preserve the existing security model: the user’s access token never leaves the browser; the LLM is never in the authorization trust path.
Make the agent maintainable: the catalog of API operations comes from Smithy (no schema drift), and procedural knowledge ([[Runbook|runbooks]]) is reviewed via PR.

Non-goals (v1) — see PRD §“Out of Scope” for the full list. Key omissions: cross-device transcript persistence, per-user rate limits and cost budgets, product/policy Q&A, macro-tools, model-evaluation suite.

2. System architecture

flowchart LR
    User([User])
    SPA[React SPA<br/>· Chat UI<br/>· Browser Turn Orchestrator<br/>· Proposal Executor<br/>· Tool Registry<br/>· Navigation Intent Handler]
    Router[Router Service<br/>route-based authn/authz]
    Chat[Chat Service Lambda<br/>· Agent Loop<br/>· System Prompt Builder<br/>· Tool Manifest Validator<br/>· Runbook Retriever]
    BackendAPI[Existing Backend APIs<br/>Cognito-authorized]
    Bedrock[Bedrock Converse<br/>Claude Haiku 4.5]
    KB[Bedrock Knowledge Base<br/>Runbooks]
    S3[(S3 — runbook source)]
    CW[CloudWatch<br/>logs · metrics · X-Ray]

    User <-->|chat panel| SPA
    SPA -->|access token| BackendAPI
    SPA -->|"/chat/turn (Cognito auth)<br/>{transcript, userMessage?, toolResults?}"| Router
    Router --> Chat
    Chat -->|ConverseStream| Bedrock
    Chat -->|Retrieve| KB
    Chat -.->|"structured logs<br/>shape only, no plaintext"| CW
    S3 -.->|ingestion job| KB

    classDef external fill:#eee,stroke:#999;
    class User,Bedrock,KB,S3,CW,BackendAPI,Router external;

The flow of authority: the user’s access token lives only in the SPA and is attached only to direct calls to the existing Backend APIs. The chat service never sees it. The LLM never sees it. See ADR 0001.

3. Components

3.1 SPA components

Component	Responsibility	Notes
Chat UI	Message list, streaming text renderer, input box, interleaved approval cards and navigation toasts.	Stateless visual layer.
Browser Turn Orchestrator	State machine driving the turn cycle: send turn → render reply → collect approvals → execute → send results → next turn. Holds the in-memory transcript.	Deep, tested as a reducer.
Proposal Executor	Validates [[Tool-Call Proposal]] args against the [[Tool Manifest]], dispatches via [[Tool Registry (Browser)]], applies response projection / 4KB truncation, normalizes results.	Deep, tested with mocked SDK.
Tool Registry (Browser)	Static `toolName → (args) => existingSdk.foo.bar(args)` map.	Shallow, data. The seam that lets us reuse the SDK rather than build a parallel HTTP client.
Approval Card UI	Per-proposal card. Renders tool name, args (IDs prominent), risk class, approve/decline. Destructive variants add a confirmation gate.	Shallow UI; design-reviewed (issue 08).
Navigation Intent Handler	Auto-executes [[Navigation Intent	navigation intents]]: SPA router push + “Taking you to …” toast with undo.

3.2 Chat Service components

Component	Responsibility	Notes
Conversation Turn Handler	Lambda entry. Parses request, invokes Agent Loop, streams response via Lambda response streaming.	Shallow, covered transitively by Agent Loop tests.
Agent Loop	Calls Bedrock Converse with the composed system prompt + tool definitions + transcript. Parses the model response into `{ assistantText, proposals, navigationIntents }`. On tool-result feedback, continues the loop.	Deep, tested with mocked Bedrock.
System Prompt Builder	Composes system prompt: agent role + navigation/scope rules + always-on runbook title index + per-turn user identity context.	Deep, pure. Snapshot-tested.
Tool Manifest Validator	Verifies any tool-use block the model emits references an allowlist entry and the args conform. Rejects malformed proposals before they reach the browser.	Sanity, not authz. Deep, table-driven tests.
Runbook Retriever	Wraps Bedrock KB `Retrieve` for the `lookupRunbook` tool.	Shallow.
Bedrock Client Wrapper	Thin `ConverseStream` wrapper with error normalization.	Shallow.

3.3 Build pipeline

Component	Responsibility	Notes
Smithy → Manifest Generator	Build-time tool: consumes Smithy model + allowlist + description overrides, emits the [[Tool Manifest]] consumed by both the chat service and the SPA. Validates allowlist entries exist in Smithy and have descriptions + risk classes.	Deep, fixture-tested.
Runbook sync CI	On merge: uploads `/runbooks/*.md` to S3, triggers Bedrock KB ingestion. Validates frontmatter and `tools-referenced`.	Shallow.

3.4 Module dependency map

flowchart TB
    subgraph SPA
        ChatUI[Chat UI]
        Orch[Browser Turn Orchestrator]
        ApprovalCard[Approval Card UI]
        NavHandler[Navigation Intent Handler]
        Executor[Proposal Executor]
        Registry[Tool Registry]
        SDK[Existing Frontend SDK]
    end
    subgraph ChatSvc[Chat Service]
        Handler[Conversation Turn Handler]
        Loop[Agent Loop]
        Prompt[System Prompt Builder]
        Validator[Tool Manifest Validator]
        Retriever[Runbook Retriever]
        BedrockClient[Bedrock Client Wrapper]
    end
    Manifest[(Tool Manifest<br/>build-time JSON)]

    ChatUI --> Orch
    Orch --> ApprovalCard
    Orch --> NavHandler
    Orch --> Executor
    Executor --> Registry
    Executor --> Manifest
    Registry --> SDK

    Handler --> Loop
    Loop --> Prompt
    Loop --> Validator
    Loop --> BedrockClient
    Loop --> Retriever
    Validator --> Manifest
    Prompt --> Manifest

4. Sequence flows

4.1 Text-only turn (no tools)

sequenceDiagram
    actor U as User
    participant S as SPA<br/>(Orchestrator)
    participant R as Router
    participant C as Chat Service
    participant B as Bedrock

    U->>S: types message
    S->>S: append to transcript
    S->>R: POST /chat/turn<br/>{transcript, userMessage}
    R->>R: authn/authz
    R->>C: forward + identity
    C->>C: build system prompt
    C->>B: ConverseStream(system, messages, tools)
    B-->>C: stream text tokens
    C-->>S: stream {assistantText}
    S-->>U: render streamed tokens
    Note over S: append assistant turn to transcript

4.2 Turn with a single tool call (the core HITL flow)

sequenceDiagram
    actor U as User
    participant S as SPA<br/>(Orchestrator)
    participant E as Proposal<br/>Executor
    participant SDK as Frontend SDK
    participant API as Backend API
    participant C as Chat Service
    participant B as Bedrock

    U->>S: "what plan am I on?"
    S->>C: POST /chat/turn {transcript, userMessage}
    C->>B: ConverseStream(..., tools=[manifest])
    B-->>C: tool_use{tool=getCurrentUser, args={}}
    C->>C: validate against manifest
    C-->>S: {proposals=[{id, tool, args, riskClass}]}
    S->>S: render Approval Card
    U->>S: clicks Approve
    S->>E: execute(proposal)
    E->>E: validate args vs manifest
    E->>SDK: sdk.user.getCurrent()
    SDK->>API: GET /me (Bearer access_token)
    API-->>SDK: 200 {name, tier, ...}
    SDK-->>E: response
    E->>E: project/truncate per manifest
    E-->>S: {id, status:"ok", body}
    S->>C: POST /chat/turn {transcript, toolResults=[...]}
    C->>B: ConverseStream(..., tool_result fed back)
    B-->>C: stream text "You're on the Pro plan…"
    C-->>S: stream {assistantText}
    S-->>U: render reply

The “suspend / resume” between the proposal emission and the tool-result feedback is just two HTTP calls, not anything exotic. The chat service is stateless; the browser holds the transcript and replays it each turn.

4.3 Turn with a decline

sequenceDiagram
    actor U as User
    participant S as SPA<br/>(Orchestrator)
    participant C as Chat Service
    participant B as Bedrock

    U->>S: "cancel my subscription"
    S->>C: POST /chat/turn
    C->>B: ConverseStream
    B-->>C: tool_use{tool=requestCancellation, args={...}}
    C-->>S: proposals=[{... riskClass:"destructive"}]
    S->>S: render Approval Card (destructive variant, gate)
    U->>S: clicks Decline
    S->>C: POST /chat/turn {toolResults=[{status:"declined"}]}
    C->>B: ConverseStream (sees declined tool_result)
    B-->>C: stream text "OK — I won't proceed. Want me to…?"
    C-->>S: stream assistantText
    S-->>U: render acknowledgement

No silent retry. The model is prompted by the system rules to acknowledge and suggest alternatives or stop — never re-emit the same proposal unprompted.

4.4 Turn with a runbook lookup + multi-step plan

sequenceDiagram
    actor U as User
    participant S as SPA
    participant C as Chat Service
    participant B as Bedrock
    participant KB as Bedrock KB

    U->>S: "cancel my subscription"
    S->>C: POST /chat/turn
    C->>B: ConverseStream
    B-->>C: tool_use{tool=lookupRunbook, query="cancel subscription"}
    C->>KB: Retrieve(query)
    KB-->>C: runbook chunks
    C->>B: ConverseStream (runbook fed back as tool_result)
    B-->>C: text + tool_use{tool=getSubscription, args={}}
    C-->>S: assistantText + proposal
    Note over U,S: ... user approves, browser executes,<br/>result fed back ...
    Note over B: agent chains next step from runbook
    B-->>C: tool_use{tool=requestCancellation, args=...}
    Note over U,S: ... approval + execute ...
    B-->>C: tool_use{tool=getCancellationStatus, args=...}
    Note over U,S: ... approval + execute ...
    B-->>C: stream text "Done — your subscription is cancelled."

lookupRunbook is server-executed (no token needed, no side effect) so no per-call approval. Every subsequent API call is browser-executed and individually approved.

sequenceDiagram
    actor U as User
    participant S as SPA
    participant C as Chat Service
    participant B as Bedrock

    U->>S: "take me to billing"
    S->>C: POST /chat/turn
    C->>B: ConverseStream
    B-->>C: text "Heading to billing…" +<br/>tool_use{tool=navigate, args={url:"/billing"}}
    C-->>S: {assistantText, navigationIntents=[{url}]}
    S-->>U: render text, then toast "Taking you to /billing"
    S->>S: router.push("/billing")

Navigation intents auto-execute (no approval) because they mutate only client state and require no access token.

4.6 Turn with an out-of-scope question

sequenceDiagram
    actor U as User
    participant S as SPA
    participant C as Chat Service
    participant B as Bedrock

    U->>S: "what's your refund policy?"
    S->>C: POST /chat/turn
    C->>B: ConverseStream (system prompt: "decline policy Qs, navigate")
    B-->>C: text "Our refund details live on the help center —<br/>let me take you there." +<br/>tool_use{tool=navigate, args={url:"/contact"}}
    C-->>S: {assistantText, navigationIntents=[{url:"/contact"}]}
    S-->>U: render decline + toast + navigation

5. Browser turn orchestrator state machine

stateDiagram-v2
    [*] --> Idle
    Idle --> AwaitingTurn: userTyped / send
    AwaitingTurn --> Streaming: first chunk received
    Streaming --> Streaming: more tokens
    Streaming --> AwaitingApprovals: proposals received
    Streaming --> ExecutingNavigations: navigationIntents received
    Streaming --> Idle: turnComplete, no proposals
    ExecutingNavigations --> Idle: navigation fired
    AwaitingApprovals --> Executing: user approved at least one
    AwaitingApprovals --> AwaitingTurn: all declined → send declines
    Executing --> Executing: more proposals pending
    Executing --> AwaitingTurn: all results gathered → send next turn
    AwaitingTurn --> Error: network / 5xx
    Streaming --> Error: stream aborted
    Error --> Idle: user dismisses

States are tested as reducer transitions (deep module). The state machine is what makes the orchestrator straightforward to test in isolation — no DOM, no network, just state + events.

6. Data shapes (contract surface)

These are the contracts between subsystems. Field names are stable; exact serialization is an implementation detail.

6.1 Browser ⇄ Chat Service

Request (POST /chat/turn)

{
  transcript:    Message[]              // full prior conversation
  userMessage?:  string                 // if user just typed
  toolResults?:  ToolResult[]           // if browser just executed proposals
}

Response (streamed)

{
  assistantText:      string            // streamed in chunks
  proposals:          Proposal[]        // emitted whole, after text
  navigationIntents:  NavigationIntent[]
}

6.2 Proposal & ToolResult

Proposal {
  id:        string                     // unique per turn
  tool:      string                     // allowlist entry name
  args:      object                     // conforms to manifest argSchema
  riskClass: "read" | "write" | "destructive"
}

ToolResult {
  id:      string                       // matches Proposal.id
  status:  "ok" | "error" | "declined"
  body?:   unknown                      // projected/truncated tool response
  error?:  { kind: "client" | "server" | "network", message: string, statusCode?: number }
}

6.3 NavigationIntent

NavigationIntent {
  url: string                           // internal path only
}

6.4 Tool Manifest (build-time JSON)

ToolManifestEntry {
  name:              string
  description:       string             // LLM-tuned
  riskClass:         "read" | "write" | "destructive"
  argSchema:         JSONSchema
  responseProjection?: PathSelector[]   // applied before truncation
  maxResponseBytes?: number             // default 4096
}

6.5 Runbook frontmatter

---
name:             string                # kebab-case slug, unique
title:            string                # human-readable
tools-referenced: string[]              # must exist in allowlist
tags:             string[]
last-reviewed:    YYYY-MM-DD
---

<body — short, LLM-tuned procedural prose>

7. Build & sync pipelines

7.1 Tool Manifest generation (build time)

flowchart LR
    Smithy[(Smithy model)]
    Allow[(allowlist.json)]
    Desc[(descriptions.json<br/>LLM-tuned overrides)]
    Gen[Smithy → Manifest<br/>Generator]
    Manifest[(tool-manifest.json)]
    SPA[SPA bundle]
    Chat[Chat Service bundle]

    Smithy --> Gen
    Allow --> Gen
    Desc --> Gen
    Gen -->|validate: allowlisted ops exist,<br/>descriptions present, risk classes present| Manifest
    Manifest --> SPA
    Manifest --> Chat

Single source of truth for the agent’s catalog. Build fails if validation fails — no silent drift.

7.2 Runbook KB sync (on merge to main)

flowchart LR
    Repo[/runbooks/*.md/]
    CI[CI step]
    S3[(S3 prefix)]
    Ingest[KB ingestion job]
    KB[Bedrock KB]

    Repo -->|push to main| CI
    CI -->|validate frontmatter +<br/>tools-referenced ⊂ allowlist| CI
    CI -->|upload changed files| S3
    CI -->|start-ingestion-job| Ingest
    Ingest --> KB

KB ingestion takes minutes. Runbook updates are not real-time; the authoring tempo (PR → review → merge → ingest) is acceptable for v1.

8. Security model

The full reasoning is in ADR 0001. Summary:

The access token never leaves the browser. It is attached only when the SPA’s Proposal Executor invokes the existing frontend SDK, exactly as the SPA does for any user-initiated click. The chat service does not receive it; Bedrock does not receive it.
The agent is a UI, not a trust boundary. The backend API’s existing per-call authz checks are the sole authorization gate. Anything the user can do via clicks, they can do via the agent; anything they can’t do via clicks, the API will reject when the agent tries.
HITL is defense-in-depth against social engineering, not against privilege escalation. The user can be tricked (e.g., via a prompt-injected tool result) into approving something they could legitimately do but shouldn’t. The approval card UX is the mitigation: IDs prominent, risk class shown, destructive operations gated.
No server-side re-validation of proposals against user permissions. Duplicating the API’s checks would be a parallel authz system that can drift — a worse posture than relying on the API.

8.1 What the LLM sees vs doesn’t see

flowchart LR
    subgraph llm[LLM sees]
        sys[System prompt]
        trans[Transcript]
        toolDefs[Tool definitions<br/>names + arg schemas + descriptions]
        toolRes[Tool result bodies<br/>projected/truncated]
        runbook[Retrieved runbook chunks]
    end
    subgraph never[LLM never sees]
        token[Access token]
        rawResp[Raw unprojected API responses<br/>over 4KB]
        otherUser[Any other user's data]
    end

    classDef good fill:#dfe;
    classDef bad fill:#fdd;
    class llm good;
    class never bad;

9. Failure modes & their handling

Failure	Detection	Handling
Model emits a tool name not in allowlist	Tool Manifest Validator (server)	Reject before sending to browser; log; the Agent Loop tells the model “that tool doesn’t exist” and continues.
Model emits args that violate the manifest schema	Tool Manifest Validator (server)	Same as above.
User declines a proposal	Browser Orchestrator	Send `{status:"declined"}` back; model is prompted to acknowledge and suggest alternatives or stop.
Backend API returns 4xx	Proposal Executor	Normalize to `{status:"error", error:{kind:"client", ...}}`; fed back; model may propose differently. No auto-retry.
Backend API returns 5xx	Proposal Executor	Same as 4xx with `kind:"server"`. No auto-retry.
Network/timeout in browser	Proposal Executor	Surface in chat UI with “try again” affordance; user must re-approve to retry.
Tool response exceeds 4KB	Proposal Executor	Apply per-tool `responseProjection` first; if still over, truncate with `…truncated, N more bytes` note.
Bedrock returns an error	Bedrock Client Wrapper	Surface to user as “something went wrong with the assistant”; no auto-retry; turn ends.
Lambda response stream interrupted	Browser Orchestrator	Mark partial turn as failed; user can resend.
Runbook KB retrieval fails	Runbook Retriever	Return empty result + log; model is told “no runbook found” and proceeds without one.
Conversation grows past comfortable token count	(Logged, not enforced in v1)	No action in v1. Watch logs; revisit if real conversations exceed ~50K tokens.

10. Deployment

flowchart LR
    subgraph aws[AWS]
        CF[CloudFront]
        FU[Lambda Function URL<br/>response streaming]
        L[Chat Service Lambda]
        BR[Bedrock Converse]
        KB[Bedrock KB]
        S3[(S3 — runbook source)]
        CW[CloudWatch]
        IAM[IAM execution role]
    end
    SPA[React SPA<br/>existing app]
    Router[Existing Router Service]

    SPA --> Router
    Router --> CF
    CF --> FU
    FU --> L
    L --> BR
    L --> KB
    L -.-> CW
    S3 -.-> KB
    L -.- IAM

Lambda with response streaming via Function URL, fronted by CloudFront for caching headers and a stable hostname. The existing Router Service forwards authenticated traffic to CloudFront.
IAM: Lambda role allows bedrock:InvokeModelWithResponseStream, bedrock-agent-runtime:Retrieve, and CloudWatch Logs writes. No other AWS API access. No access to user data stores — by construction, the chat service does not need them.
CDK app is the source of truth for all of the above; one stack per environment.

11. Open trade-offs and known risks

Trade-off	Decision	Future revisit signal
Browser-held transcript	Accepted; ships v1 fast	Users ask for cross-device resume; or support requests transcript access
No rate limits / cost budgets	Accepted; bounded risk via Cognito-only access	First abuse incident; or any non-trivial public rollout
No transcript persistence	Accepted	Support needs to review user sessions; compliance requires retention
Single model (Haiku 4.5)	Cost-first start	Quality complaints on multi-step tasks → escalate to Sonnet 4.6
No model-evals	Accepted; small v1 surface area	Runbook count grows past ~20 or retrieval quality drops
No macro-tools	Accepted; primitive chaining + per-call HITL	Approval fatigue measured in user research
No product-docs KB	Accepted; navigate-and-handoff	If a significant fraction of user queries are declined, build the KB
Out-of-scope = decline-and-navigate	Accepted	Liability cost of one wrong answer ≫ inconvenience of navigation

Each of these has a tracked deferral in PRD §“Out of Scope.” None are oversights.