npm - @aitne-sh/aitne - Versions diffs - 0.1.8 → 0.1.9 - Mend

@aitne-sh/aitne 0.1.8 → 0.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (276) hide show

package/agent-assets/docs/features/operations/approvals.md CHANGED Viewed

@@ -9,8 +9,9 @@ aliases:
   - human in the loop
 category: features
 summary: |
-  Approve-tier actions block until the operator clicks approve in
-  the dashboard. They bypass quiet hours.
+  A few high-blast-radius actions queue as approvals. The action
+  blocks until you click Approve on the dashboard Overview page;
+  the approval card stays visible there regardless of quiet hours.
 section: operations
 tags:
   - core
@@ -20,18 +21,26 @@ status: stable
 ask_examples:
   - What is an approval?
   - Why is the agent waiting for me?
+  - Where do I approve a pending action?
 locale: en-US
 created: 2026-04-25
-updated: 2026-04-25
+updated: 2026-05-28
 keywords:
   - approval
   - approve tier
-  - /dashboard/approvals
+  - approval queue
   - agent approval queue
   - approve before action
+  - deny approval
 related:
   - concepts/safety-and-execution
   - features/operations/notifications
+ui_anchors:
+  - /
+api_endpoints:
+  - GET /api/approvals
+  - POST /api/approvals/:id/approve
+  - POST /api/approvals/:id/deny
 ---
 # Approvals
@@ -39,26 +48,49 @@ related:
 ## In One Sentence
 A small set of high-blast-radius actions queue as approvals; the
-agent waits for an operator click before proceeding.
+agent waits for you to click **Approve** on the dashboard before it
+proceeds.
+## How It Works
+A few actions are classified as *Approve* tier (see
+[Safety and Execution](../../concepts/safety-and-execution.md)).
+When the agent reaches one, instead of running it the daemon parks
+the request in the approval queue:
-## What It Does
+1. The action **blocks** — nothing runs while it waits.
+2. It appears in the **approval card** on the dashboard Overview
+   page (`/`).
+3. You click **Approve** to let it run, or **Deny** to discard it.
+4. On Approve, the queued action resumes and the agent continues.
-- Blocks the action.
-- Shows it on the Overview page's approval card.
-- Sends a notification (bypasses quiet hours by design).
-- Resumes when the operator approves.
+The approval card is always shown on the Overview page while items
+are pending — it does not respect quiet hours, so you can clear it
+whenever you next open the dashboard.
 ## Where in the Dashboard
-- **Overview** shows the count badge.
-- The approvals card lists pending items with diff previews.
+The Overview page (`/`) shows an amber **approval card** whenever
+something is pending. The card header reports the count
+("2 pending approvals") and each row lists:
+- the action description,
+- a badge with its type,
+- when it was queued,
+- **Approve** and **Deny** buttons.
+Deny asks for confirmation before discarding the item.
 ## When Something Goes Wrong
-- An approval that hangs: the agent's session may have timed out.
-  The action expires when the session does; the operator must redo
-  the request that produced it.
+- **An approval that never clears:** the agent session that produced
+  it may have timed out. The action expires with the session — redo
+  the request that triggered it (for example, re-send the DM or
+  re-run the routine).
 ## Related
-- [Safety and Execution](../../concepts/safety-and-execution.md)
+- [Safety and Execution](../../concepts/safety-and-execution.md) —
+  how the agent decides an action needs approval.
+- [Notifications](notifications.md) — how the agent reaches you
+  outside the dashboard.

package/agent-assets/docs/features/operations/backend-routing.md CHANGED Viewed

@@ -11,12 +11,13 @@ category: features
 summary: |
   BackendRouter resolves each ProcessKey to a (main, fallback) backend
   pair and a tier. On BackendQuotaError or BackendDecisiveFailure, the
-  main backend's session fails over to the fallback's.
+  main backend's session fails over to the fallback's mid-run.
 section: operations
 tags:
   - core
   - operations
   - backends
+  - routing
 status: stable
 ask_examples:
   - What happens when my Claude quota is exhausted?
@@ -24,7 +25,7 @@ ask_examples:
   - How do fallbacks work?
 locale: en-US
 created: 2026-04-25
-updated: 2026-04-25
+updated: 2026-05-28
 keywords:
   - backend routing
   - BackendRouter
@@ -32,8 +33,21 @@ keywords:
   - main fallback
   - BackendQuotaError
   - BackendDecisiveFailure
+  - process_backend_config
+process_keys:
+  - message.dm
+  - agent.task
+  - delegated_task_heavy
+ui_anchors:
+  - /settings/models
+  - /activity
+api_endpoints:
+  - GET /api/process-config
+  - PUT /api/process-config/:processKey
+  - PUT /api/backends/main
 related:
   - concepts/backends-and-tiers
+  - concepts/process-keys
   - features/operations/cost-tracking
 ---
@@ -41,29 +55,67 @@ related:
 ## In One Sentence
-Each ProcessKey resolves to a `(main, fallback)` pair and a tier; on
-quota or decisive failure, the dispatcher transitions to the fallback
-mid-run.
+Every job carries a ProcessKey. The router resolves that key to a
+`(main, fallback)` backend pair plus a tier — and if the main backend
+hits a quota wall or a decisive failure, the dispatcher transitions to
+the fallback mid-run, then DMs you that it happened.
-## What It Does
+## How It Resolves a Backend
-- Reads the `process_backend_config` table to find the binding.
-- Falls back to the default tier map when no override exists.
-- Re-materializes the session workdir for the fallback backend's
-  instruction file and skill set.
+The router never picks a model itself. The dispatcher hands it a
+ProcessKey, and `BackendRouter` resolves the binding in this order:
+1. Read the `process_backend_config` table for a per-key override
+   (`main_backend` / `main_model` / `fallback_backend` / `fallback_model`).
+2. If no override exists, fall back to the ProcessKey's **default tier**
+   (`lite` → Haiku-class, `medium` → Sonnet-class, `high` → Opus-class)
+   and the seeded backend for that tier.
+3. `dashboard.docs_qa` is **tier-locked to `medium`** — an operator pin
+   can't move it.
+Only one ProcessKey — `delegated_task_heavy` — defaults to the `high`
+tier, and it is opt-in (gated by the `delegatedTaskHeavyEnabled` flag).
+No install-time surface defaults to Opus.
+## What Happens on Failover
+The two failover signals are `BackendQuotaError` (the backend hit a
+usage/budget limit) and `BackendDecisiveFailure` (auth failure, model
+unavailable, policy-denied, timeout, or turn-limit). When the main
+backend raises either:
+- The router **re-materializes the session workdir** for the fallback
+  backend — writing its instruction file (`AGENTS.md` for Codex,
+  `GEMINI.md` for Gemini, etc.) and skill set into the shared dir. Without
+  this step a Claude → Codex fallover would leave only `CLAUDE.md` and
+  `.claude/skills/`, and the fallback would run blind.
+- The fallback then executes with the same prompt and any
+  per-session tool overrides applied to the main run.
+- On success, you get a **low-priority DM** noting the main backend
+  failed and the fallback served the turn.
+- If the fallback *also* fails, you get a higher-priority notification:
+  `Backend execution failed: <key> encountered <kind> on <main>, then
+  <kind> on <fallback>.` This is usually a credentials problem on both
+  sides.
 ## Where in the Dashboard
-- **Settings → Models** is the unified surface for picking main and
-  fallback per ProcessKey.
-- **Activity** rows show which backend actually served each turn after
-  fallback resolution.
+- **[Settings → Models](/settings/models)** is the unified surface for
+  picking the main and fallback backend (and tier) per ProcessKey.
+- **[Activity](/activity)** rows show which backend actually served each
+  turn after fallback resolution, so you can see when a fallover fired.
 ## When Something Goes Wrong
-- A `fallback-failed` notification: both backends rejected the run.
-  Most often a credentials issue on both sides.
+- **A `Backend execution failed` notification** means both the main and
+  the fallback rejected the run. Check authentication for both backends
+  first — re-authorize from the dashboard if needed.
+- **A routine ran on the "wrong" backend** is usually a fallover: the
+  main backend was over quota, so the fallback served it. The Activity
+  row will confirm which backend ran.
 ## Related
 - [Backends and Tiers](../../concepts/backends-and-tiers.md)
+- [Process Keys](../../concepts/process-keys.md)
+- [Cost Tracking](./cost-tracking.md)

package/agent-assets/docs/features/operations/cost-tracking.md CHANGED Viewed

@@ -6,10 +6,14 @@ id: cost-tracking
 aliases:
   - analytics
   - cost rollup
+  - spend tracking
 category: features
 summary: |
-  The Analytics page rolls cost up by ProcessKey, by backend, and by
-  agent day. The sidebar footer shows the running daily total.
+  Aitne records the USD cost of every run into the local SQLite database
+  and rolls it up on the Analytics page by backend, by event type, and
+  over daily / weekly / monthly windows. The sidebar footer shows today's
+  running total, and two optional caps (daily and monthly) guard
+  autonomous spend.
 section: operations
 tags:
   - core
@@ -20,46 +24,109 @@ ask_examples:
   - How much did the agent cost me today?
   - Which routines are the most expensive?
   - How do I cap autonomous spending?
+  - What is the difference between the daily and monthly cost cap?
 locale: en-US
 created: 2026-04-25
-updated: 2026-04-25
+updated: 2026-05-28
 keywords:
   - cost tracking
   - analytics
   - spend
   - per-process cost
   - rollup
+  - cost cap
+  - guardrails
 related:
   - concepts/costs-and-quotas
+  - concepts/process-keys
+  - features/operations/backend-routing
+ui_anchors:
+  - /analytics
+  - /analytics?tab=metrics
+  - /settings/models
 config_keys:
   - autonomousDailyCostCapUsd
+  - autonomousMonthlyCostCapUsd
+api_endpoints:
+  - GET /api/metrics
+  - GET /api/health
 ---
 # Cost Tracking
 ## In One Sentence
-A rolling rollup of token-cost per session, indexed by ProcessKey,
-backend, and agent day.
+Aitne meters the USD cost of every run, stores it locally, and rolls it
+up on the Analytics page so you can see where your spend goes — and cap it
+if you want.
-## What It Does
+## How It Works
-- Records per-execute cost into `agent_actions`.
-- Aggregates into the Analytics page's charts.
-- Surfaces the running daily total in the sidebar footer.
+- Each agent run writes its estimated cost into the `cost_usd` column of
+  the `agent_actions` table. The estimate is `token count × backend
+  pricing` — Aitne's best guess, never a bill.
+- All data is derived from the daemon's local SQLite database and is never
+  sent anywhere external.
+- The day boundary for "today" is the agent day (04:00 local by default),
+  not midnight.
-## Where in the Dashboard
+## Where to Look in the Dashboard
-- **Analytics** is the rollup.
-- **Settings → Models → Cost Guardrails** holds
-  `autonomousDailyCostCapUsd`.
+### Analytics page (the rollup)
-## When Something Goes Wrong
+Open **Analytics**. It has two tabs:
-- A cost number that looks wrong: cross-check against the backend's
-  own dashboard. Aitne's count is its best estimate from
-  per-call token math.
+- **Cost** — per-run USD spend. A period selector switches between
+  **Daily**, **Weekly**, and **Monthly** windows, with summary cards for
+  **Today**, **Last 7 Days**, and **Last 30 Days**. Inside Cost:
+  - **Overview** — a cost-trend chart over the selected period plus a
+    **By Event Type** breakdown (which process keys cost the most).
+  - **By Backend** — totals and a trend chart split by the backend that
+    *actually executed* each run. This reflects fallbacks and Gemini
+    auto-routing, not just your configured preferred backend.
+- **Metrics** (`/analytics?tab=metrics`) — operational health: activity
+  volume, execution breakdown, error rates, notification throughput.
+Note on delegated work: only **cross-backend** delegated calls show up as
+separate runs. Same-backend delegated/native calls roll up under the
+parent session's totals.
+### Sidebar footer (running daily total)
+The left sidebar footer shows today's running spend (`health.todayCostUsd`
+— `SUM(cost_usd)` over the current agent day). It updates as runs complete.
+## Capping Autonomous Spend
+**Settings → Models → Cost guardrails** holds two optional caps. Both are
+disabled (blank) by default and apply only to **autonomous** work —
+reactive work such as DMs and mentions always runs.
+- **`autonomousDailyCostCapUsd`** (Autonomous Daily Cost Cap) — when
+  today's autonomous spend reaches the cap, the dispatcher skips
+  lower-priority routines first, using priority-based degradation:
+  - `hourly_check` — skipped at 100% of the cap
+  - `roadmap_refresh` — skipped at 120%
+  - `evening_review` — skipped at 150%
+  - `morning_routine` — last to be cut, only at 200%
+  This leaves headroom for the morning briefing even when you're over
+  budget.
+- **`autonomousMonthlyCostCapUsd`** (Autonomous Monthly Cost Cap — alert
+  only) — a notification threshold for rolling 30-day spend. It surfaces a
+  warning at 80% and an error at 100% in the Notifications panel but does
+  **not** stop any work. Pair it with the daily cap if you want a hard
+  guardrail.
+## When a Cost Number Looks Wrong
+Aitne's count is its best estimate from per-call token math, not the
+provider's invoice. If a number looks off, cross-check it against the
+backend's own usage dashboard.
 ## Related
 - [Costs and Quotas](../../concepts/costs-and-quotas.md)
+- [Process Keys](../../concepts/process-keys.md)
+- [Backend Routing](./backend-routing.md)

package/agent-assets/docs/features/operations/managed-chromium.md ADDED Viewed

@@ -0,0 +1,221 @@
+---
+schema_version: 1
+slug: features/operations/managed-chromium
+title: Managed Chromium (B-4)
+id: managed-chromium
+aliases:
+  - managed chromium
+  - B-4
+  - purchase confirmation
+  - browser automation
+  - chromium automation
+category: features
+summary: |
+  Experimental, default-off purchase-confirmation flow. The daemon
+  spawns a managed Chromium profile to complete a vendor checkout the
+  agent has already prepared, after the operator approves with a
+  single-use DM token. Heavily gated; designed to be safe to read about
+  before you ever turn it on.
+section: operations
+tags:
+  - operations
+  - safety
+  - browser-automation
+  - experimental
+status: experimental
+ask_examples:
+  - What is B-4?
+  - Can Aitne buy things for me?
+  - What is the !~ token in my DM?
+  - How do I enable managed Chromium purchases?
+  - How do I block a site from managed Chromium?
+locale: en-US
+created: 2026-05-22
+updated: 2026-05-28
+keywords:
+  - managed chromium
+  - browser automation
+  - purchase token
+  - B-4
+  - "!~xxxxxxxx"
+  - per-site opt-in
+  - experimental danger
+  - hostname denylist
+related:
+  - features/integrations/browser-history
+  - features/operations/approvals
+  - concepts/safety-model
+  - concepts/safety-and-execution
+  - reference/disallowed-tools
+ui_anchors:
+  - /settings/integrations/browser-history-managed
+  - /settings/integrations/browser-history-managed/b4
+process_keys:
+  - browser_task
+  - message.dm
+config_keys:
+  - browserTaskHostnameDenylist
+api_endpoints:
+  - POST /api/browser-automation/b4/enabled
+  - PATCH /api/browser-automation/sites/:siteKey/b4-config
+  - GET /api/browser-automation/purchase-tokens
+  - POST /api/browser-automation/sites/:siteKey/connect
+  - POST /api/browser-task
+---
+# Managed Chromium (B-4)
+B-4 is the experimental purchase-confirmation flow. When you've asked
+the agent to "buy X" or "complete the checkout", and the vendor is on
+your B-4 allowlist, the daemon spawns a managed Chromium profile,
+fills the cart, and pauses for an explicit one-time token from your
+DM before clicking the final confirm. **It is default-off**, gated
+behind every safety check the project ships, and not surfaced in the
+public dashboard until the upstream B-3 surface (browser-history
+research) has been stable for six weeks.
+This page is written so it's safe to read whether you've enabled it
+or not.
+## What's Actually Gated
+Before B-4 can run, every one of these must be true:
+1. The **master toggle** `runtime_state.managed_chromium.b4_enabled`
+   is `true`. Default is `false`, set via
+   `POST /api/browser-automation/b4/enabled` with body
+   `{ enabled: true, acknowledge: true }` (Approve-tier).
+2. You've acknowledged the **experimental-danger modal** on
+   `/settings/integrations/browser-history-managed/b4`. The modal
+   lists the failure modes and warns that the guard is bypassable if
+   the daemon or messaging platform is compromised.
+3. At least one **primary DM channel** is set (Slack / Telegram /
+   Discord / WhatsApp). The single-use token is delivered there; the
+   dashboard never shows the raw token.
+4. The **site is on your B-4 allowlist**. Per-site enablement happens
+   via `PATCH /api/browser-automation/sites/:siteKey/b4-config`
+   (Approve). Sites not in the allowlist cannot run a B-4 flow even
+   if the master toggle is on.
+5. The **site is signed in** through the B-2.5 per-site sign-in
+   flow (`POST /api/browser-automation/sites/:siteKey/connect` →
+   sign in by hand in the spawned UI Chromium →
+   `POST .../finalize`). The daemon stores the profile in a
+   restricted directory the absolute-block layer protects from any
+   skill.
+## Structural Defences (no hardcoded category denylist)
+Earlier builds hardcoded a category denylist (banking, brokerages,
+government, healthcare, identity / legal, payment processors). **That
+framework-level category denylist was removed on 2026-05-27** — Aitne
+is not a Japan-specific product and does not ship an opinionated brand
+or category blocklist. What protects you now is structural, not a
+category list:
+1. **IP CIDR egress layer (hardcoded, not configurable).** Any
+   navigation that resolves to a private (RFC1918), loopback,
+   link-local, multicast, cloud-metadata (`169.254.169.254`), or the
+   IPv6 equivalents is denied at the egress chokepoint
+   (`shouldDenyEgress` in `egress-denylist.ts`). This is the
+   defence-in-depth against SSRF — it cannot be turned off.
+2. **Payment-path blocker.** A URL-pattern matcher
+   (`payment-path-blocker.ts`) trips at form-submit time on
+   payment-handoff paths so the agent can't silently push a
+   transaction through.
+3. **The B-4 token primitive itself** — no final confirm without a
+   live, matched, single-use token (see below).
+**Domain-level deny is now user-managed.** If you want to keep B-4 (or
+any browser task) away from specific hostnames, add them to
+`browserTaskHostnameDenylist` (default empty, up to 500 entries) from
+Dashboard → `/settings/integrations/browser-history-managed`. The list
+ships empty.
+## The Token Flow
+1. The agent prepares the checkout in a managed Chromium tab and
+   pauses at the final confirm step.
+2. The daemon mints a single-use token with the prefix `!~` followed
+   by 8 random hex characters (e.g. `!~3a1f9c7b`), inserts a
+   `browser_automation_purchase_tokens` row keyed on a server-side
+   `jti`, and DMs the token to a primary channel together with a
+   screenshot of the exact cart state.
+3. You reply with the token on the same DM channel. The daemon
+   matches inbound text against pending tokens; a match advances the
+   flow and the agent clicks confirm.
+4. **5-minute timeout.** If no match arrives in 5 minutes, the token
+   expires, the tab closes, and the agent reports back that the
+   purchase was abandoned.
+5. **Raw token never leaves the table.** The dashboard's audit views
+   show `jti` + delivery state only — even a brief credential
+   compromise can't extract live tokens.
+`GET /api/browser-automation/purchase-tokens` lists pending +
+recent (Approve-tier); `DELETE
+/api/browser-automation/purchase-tokens/:jti` cancels a pending
+token before its timeout.
+## Site Bootstrap (B-2.5)
+The same site infrastructure powers anonymous reads (B-2),
+authenticated reads (B-2.5), and B-4. Per-site state is managed by
+`managed-chromium-sites-store.ts`; the bootstrap UI flow is:
+| Step | Route |
+|---|---|
+| Spawn a UI Chromium window to sign in by hand | `POST /api/browser-automation/sites/:siteKey/connect` |
+| Poll progress | `GET /api/browser-automation/sites/:siteKey/status` |
+| Confirm signed-in, close UI window | `POST /api/browser-automation/sites/:siteKey/finalize` |
+| Re-spawn UI Chromium reusing the profile (re-auth) | `POST /api/browser-automation/sites/:siteKey/reauth` |
+| Kill processes + delete the profile dir | `POST /api/browser-automation/sites/:siteKey/disconnect` |
+## When It Runs
+| Surface | Source |
+|---|---|
+| Operator asks the agent to "buy X" / "checkout" via DM | `message.dm` → checkout path |
+| Open-ended browser request (DM, dashboard, or scheduler) | `browser_task` (medium tier, Claude-only) — see `BROWSER_TASK_REDESIGN_PLAN.md` |
+Proactive re-auth DMs come from the `reauth-detector` in
+`managed-chromium-supervisor.ts`.
+## Why You'd Turn It On
+You wouldn't, yet. Until B-3 has been stable for six weeks, B-4 is
+gated to project-owner self-testing. Once it opens, the typical
+use case is recurring small purchases at vendors you trust (groceries,
+specific subscriptions, narrow shopping windows) where the agent has
+the cart context and you want a single tap to confirm rather than a
+full hand-off.
+## Why You Might Not
+- The guard is **experimental and bypassable** if the daemon process
+  or any of your messaging platforms is compromised. A high-privilege
+  attacker on either side can pretend to be you and complete a
+  purchase.
+- Vendor flows change. A working B-4 site today can break tomorrow if
+  the vendor restructures the checkout DOM — the agent's recovery
+  story is "abandon and DM you", but you'll still see a partial cart.
+- There is no built-in category guard. Aitne will not refuse a
+  high-stakes site for you (banks, brokerages, government, healthcare)
+  — those decisions are yours. If you don't trust B-4 with a site,
+  simply don't add it to the per-site allowlist, or add its hostname
+  to `browserTaskHostnameDenylist`.
+## Related
+- [Approvals](approvals.md) — the broader Approve-tier model that
+  governs everything B-4 routes through.
+- [Safety Model](../../concepts/safety-model.md) — the categorical
+  rules. B-4 narrows the "no financial transactions" rule to a
+  gated, screenshot-first, token-bound exception.
+- [Safety and Execution](../../concepts/safety-and-execution.md) — Safe
+  / Allow modes and the absolute-block layer that protects the
+  managed-Chromium profile dir from any skill.
+- [Browser History](../integrations/browser-history.md) — separate
+  read-only integration (B-3); B-4 builds on the same site
+  registry but is a distinct surface.
+- [Disallowed Tools](../../reference/disallowed-tools.md) — the
+  absolute-block matchers that cover managed-Chromium profile
+  directories.