PyPI - cinna-cli - Versions diffs - 0.1.0__py3-none-any.whl - Mend

cinna-cli 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (23) hide show

cinna/__init__.py +3 -0
cinna/auth.py +42 -0
cinna/bootstrap.py +278 -0
cinna/client.py +169 -0
cinna/config.py +193 -0
cinna/console.py +39 -0
cinna/context.py +216 -0
cinna/errors.py +56 -0
cinna/logging.py +38 -0
cinna/main.py +715 -0
cinna/mcp_proxy.py +151 -0
cinna/mutagen_runtime.py +168 -0
cinna/sync.py +120 -0
cinna/sync_session.py +418 -0
cinna/sync_ssh_shim.py +232 -0
cinna/sync_tui.py +352 -0
cinna/templates/CLAUDE.md.template +558 -0
cinna/templates/__init__.py +0 -0
cinna_cli-0.1.0.dist-info/METADATA +231 -0
cinna_cli-0.1.0.dist-info/RECORD +23 -0
cinna_cli-0.1.0.dist-info/WHEEL +4 -0
cinna_cli-0.1.0.dist-info/entry_points.txt +3 -0
cinna_cli-0.1.0.dist-info/licenses/LICENSE.md +21 -0

cinna/templates/CLAUDE.md.template ADDED Viewed

@@ -0,0 +1,558 @@
+# Agent: {agent_name} — Local Development Guide
+> Auto-generated by cinna. Do not edit manually — this file is overwritten on every `cinna setup` and `cinna pull`. Regenerated: {timestamp}
+You are working in the **local development copy** of a Cinna Core agent. The workspace under `workspace/` is continuously bidirectionally synced (via Mutagen) with the agent's remote container at `/app/workspace/`. All commands execute remotely via `cinna exec`. There is **no local Docker** — the remote env is the only runtime.
+**Read `BUILDING_AGENT.md` first** for the platform-assembled building-mode system prompt — agent role, capabilities, scripts inventory, credentials map, knowledge topics. This file (`CLAUDE.md`) is the local-development companion: how to build, run, schedule, and ship work in this workspace.
+---
+## 1. Mental model
+- **Bundle-owned files** (`scripts/`, `docs/`, `webapp/`, `knowledge/`, `files/`, `workspace_requirements.txt`, `workspace_system_packages.txt`) are the agent's *source*. They are snapshotted when a new bundle revision is published. Edits here become the next shipped revision.
+- **`app-data/`** is per-user persistent runtime state (per `(user_id, bundle_id)` volume). **Not** part of bundle revisions. Survives `apply-update` and uninstall/reinstall. Your local copy is your *own* developer install's runtime state.
+- **`credentials/`** is backend-managed. Read-only on your side. Never commit, never read in agent output.
+Treat the agent like a Python project whose entrypoints are scripts under `scripts/`, whose behavior is driven by prompts under `docs/`, and whose knowledge comes from `knowledge/` + the platform's vector search via the `knowledge_query` MCP tool.
+---
+## 2. Workspace layout
+```
+.cinna/
+└── config.json                 # CLI config + token; do not edit
+workspace/                      # continuously synced with remote /app/workspace
+├── scripts/                    # bundle-owned — Python entrypoints + helpers
+│   ├── README.md               # documents every script (what / how to run / output)
+│   ├── main.py                 # primary entrypoint (convention)
+│   ├── <capability>/           # subfolder per capability when 3+ skills exist
+│   │   └── …
+│   └── shared_utils.py         # shared helpers at top level
+├── docs/                       # bundle-owned — PROMPT files (the agent's brain)
+│   ├── WORKFLOW_PROMPT.md      # main behavior prompt
+│   ├── ENTRYPOINT_PROMPT.md    # how the agent starts a conversation
+│   ├── REFINER_PROMPT.md       # post-processing / quality pass
+│   └── <domain>.md             # additional domain-knowledge docs
+├── webapp/                     # bundle-owned — dashboard HTML/CSS/JS + Python data endpoints
+├── knowledge/                  # bundle-owned — static integration docs shipped in bundle
+├── files/                      # bundle-owned — static publisher-shipped assets (lookup tables, fixtures, email templates)
+├── app-data/                   # per-user persistent (NOT shipped in bundle revisions)
+│   ├── storage/                #   long-lived output (DBs, reports, derived data, run-state)
+│   ├── uploads/                #   runtime user uploads — read-only from scripts' POV
+│   └── cache/                  #   disposable caches scripts rebuild on demand
+├── credentials/                # backend-managed; read-only; never log or echo
+├── workspace_requirements.txt  # Python deps installed in the remote env
+└── workspace_system_packages.txt   # apt packages installed in the remote env
+mutagen.yml                     # sync rules; customize ignores here
+BUILDING_AGENT.md               # building-mode system prompt from the platform
+CLAUDE.md                       # this file — auto-generated
+.mcp.json / opencode.json       # MCP wiring for AI tools
+```
+Rules of thumb:
+- **New file?** Decide first which persistence tier it belongs to (bundle / app-data / credentials). That determines the folder.
+- **Generated at runtime?** It belongs under `app-data/`, never under bundle-owned folders.
+- **Read from `uploads/`, write to `storage/`.** Never write to `uploads/` from a script — that folder is owned by the runtime.
+---
+## 3. Where to put what
+| Kind of file | Put it in | Why |
+|---|---|---|
+| Python entrypoints (run by the agent or by the user) | `scripts/` | Bundle-owned source. Add a subfolder per capability once you have 3+ skills. |
+| Shared Python helpers | `scripts/` (top level) | Importable across skills. |
+| Prompt files (workflow / entrypoint / refiner / role-specific) | `docs/` | The runtime loads these as the agent's prompts. Naming convention is `*_PROMPT.md` for the main slots. |
+| Domain knowledge docs the *agent* should read at runtime | `docs/` or `knowledge/` | `docs/` if the prompt references it inline; `knowledge/` if it should be indexed and queried via `knowledge_query`. |
+| Static lookup tables, fixtures, sample inputs shipped with the agent | `files/` | Treated as static publisher assets, not source code. |
+| Email / notification templates (Jinja2 `.html.j2`, `.txt.j2`) | `files/templates/email/` | Bundle-owned: shape stays identical across installs. See §13. |
+| Dashboard pages, data endpoints | `webapp/` | Served by the env's webapp slot. |
+| Python dependencies | `workspace_requirements.txt` | Installed in the remote env. See §9. |
+| System (apt) packages | `workspace_system_packages.txt` | Installed in the remote env on env build. |
+| Long-lived runtime output (DB, reports, derived data, run-state files) | `app-data/storage/` | Survives bundle updates and reinstalls. |
+| User uploads at runtime | `app-data/uploads/` (auto-populated) | Read from here, do not write here. |
+| Rebuildable derived caches | `app-data/cache/` | Safe to delete; scripts must be able to rebuild from source. |
+| Credentials / API tokens | nowhere in the workspace | Backend-managed in `credentials/`; access them from scripts via the platform's standard credential loader. |
+---
+## 4. Prompts (`docs/`)
+The runtime expects a small set of named prompt files. Standard slots:
+- **`docs/WORKFLOW_PROMPT.md`** — the main system prompt: role, capabilities, rules, tools.
+- **`docs/ENTRYPOINT_PROMPT.md`** — how the agent opens a session / greets the user.
+- **`docs/REFINER_PROMPT.md`** — optional post-processing pass over the agent's draft output.
+Conventions:
+- Lead with the agent's role and purpose in one paragraph.
+- List capabilities as discrete sections. Each capability should map to a script or script subfolder.
+- Document trigger phrases for capabilities (e.g. "when the user asks to check X, run `scripts/checks/run.py`").
+- Reference docs/knowledge files explicitly: `See docs/<topic>.md for the business rules.` — the runtime will surface those.
+- Spell out the "never" rules: never echo credentials, never write to `uploads/`, never modify `credentials/`.
+If a capability needs more than 5–10 lines of workflow, **split it into its own doc file** under `docs/<capability>.md` and reference it from `WORKFLOW_PROMPT.md`. Keep `WORKFLOW_PROMPT.md` lean.
+---
+## 5. Scripts (`scripts/`)
+- Every script must be runnable via `cinna exec python scripts/<path>.py` — no implicit working-directory assumptions; resolve paths relative to the script or use absolute container paths (workspace is mounted at `/app/workspace/`).
+- **Read inputs from** `workspace/files/`, `app-data/uploads/`, or `app-data/storage/`. **Write outputs to** `app-data/storage/` (durable) or `app-data/cache/` (disposable).
+- **Pagination must use a deterministic sort order** (e.g. `id ASC`). Unsorted offset/limit pagination silently skips and duplicates rows — a famously subtle bug. If the API has no sort param, fetch all IDs first then page by ID.
+- **Caches must be rebuildable from scratch.** `app-data/cache/` may be deleted at any time. No script may depend on a prior cache state; a fresh run on empty `cache/` must produce the same result as a re-run on a stale `cache/`. Cache-update scripts should **replace**, never incrementally patch.
+- Maintain `scripts/README.md` listing every script, its purpose, and example invocation. Group by capability when you have subfolders.
+### File paths inside the container
+```python
+from pathlib import Path
+# Relative to script
+config = Path(__file__).parent / "config.json"
+# Absolute inside container
+WORKSPACE = Path("/app/workspace")
+state = WORKSPACE / "app-data" / "storage" / "checks" / "state.json"
+```
+### Capability subfolders
+Once an agent has 3+ distinct capabilities, organize:
+```
+scripts/
+├── README.md
+├── shared_utils.py
+├── <capability_a>/
+│   ├── run.py
+│   └── helpers.py
+└── <capability_b>/
+    └── run.py
+```
+Keep shared utilities at the top level. Keep single-purpose scripts that don't belong to a capability flat at the top level. Don't pre-split before you have the skills.
+---
+## 6. Knowledge base (`knowledge/`)
+- Drop integration docs / reference material here as Markdown.
+- The platform indexes these for the `knowledge_query` MCP tool. Verify a doc is reachable by querying it from this session (MCP is auto-wired — see §16).
+- `knowledge/` is bundle-owned: every installer of your bundle gets these files.
+- Use `app-data/storage/` (not `knowledge/`) for anything generated at runtime.
+---
+## 7. Webapp (`webapp/`)
+- Static frontend assets + Python data endpoints under the env's webapp slot.
+- Keep dashboard logic in the frontend; keep data-shaping in Python endpoints. Don't query external systems directly from the browser.
+- Read state from `app-data/storage/` — never recompute heavy results on every request.
+---
+## 8. Credentials
+- Live entirely on the platform. The CLI surfaces them under `workspace/credentials/` read-only for inspection.
+- Scripts access them through the runtime's standard credential interface. Never copy a credential into another file in `workspace/`.
+- Never `print`, `log`, or echo a credential value. Never include credential values in returned strings, web responses, or error messages.
+---
+## 9. Dependencies
+The container ships a uv-managed virtual environment at `/app/.venv`. **There is no standalone `pip` binary** — install through `uv`.
+```bash
+# Correct — install into the container's venv
+cinna exec uv pip install <package-name>
+# Wrong — pip binary doesn't exist
+cinna exec pip install <package-name>
+```
+After installing, **persist the dependency** by adding it to `workspace_requirements.txt`. Without that step the package disappears on the next env rebuild.
+- **Python deps:** add to `workspace_requirements.txt`. Sync via `cinna exec uv pip install -r workspace_requirements.txt`.
+- **System (apt) packages:** add to `workspace_system_packages.txt`. Applied on env rebuild — coordinate with platform UI.
+- Don't install anything locally — there's no local container; local installs drift from prod.
+---
+## 10. Development loop
+| Task | Command |
+|---|---|
+| Foreground dev session (live sync TUI) | `cinna dev` |
+| Run a script in the remote env | `cinna exec python scripts/main.py` |
+| Run any command in the remote env | `cinna exec <command>` |
+| Install a Python package | `cinna exec uv pip install <pkg>` then add to `workspace_requirements.txt` |
+| Sync status from another terminal | `cinna sync status` |
+| List conflict copies | `cinna sync conflicts` |
+| Refresh expired CLI token | `cinna set-token <token_or_url>` |
+Iteration: edit files locally → Mutagen syncs them remotely within seconds → `cinna exec <cmd>` runs the script; stdout/stderr stream back live; exit code matches → inspect `app-data/storage/` for output.
+Conflicts: Mutagen writes `<file>.conflict.<side>.<timestamp>` instead of picking a winner. Open in the editor, pick a winner, delete the conflict copy.
+---
+## 11. Building a new capability — concrete steps
+1. **Pick the trigger.** What does the user say to activate it? Record this in `docs/WORKFLOW_PROMPT.md`.
+2. **Decide if it's its own capability or inline.** Multi-step workflow + own scripts → its own capability with `docs/<capability>.md`. One-shot command → inline in `WORKFLOW_PROMPT.md`.
+3. **Create the script.** `workspace/scripts/<capability>/run.py` (or top-level if single-skill agent). Make it runnable as `cinna exec python scripts/<capability>/run.py`.
+4. **Wire I/O paths.** Inputs from `files/` / `uploads/` / `storage/`. Outputs to `storage/` (durable) or `cache/` (disposable).
+5. **Document the script** in `scripts/README.md`.
+6. **Write the capability doc** at `docs/<capability>.md`: when to use, workflow, how to present results, technical notes.
+7. **Reference it from `WORKFLOW_PROMPT.md`** with the trigger condition and a pointer to the doc.
+8. **Smoke-test:** `cinna exec python scripts/<capability>/run.py <args>` — confirm exit code 0 and the expected files in `storage/`.
+9. **Verify `knowledge_query`** (if the capability uses it): ask a question the new docs should cover; confirm a hit.
+10. **Check sync status:** `cinna sync status` — no conflicts, no pending changes.
+---
+## 12. Scheduled / recurring execution
+Schedules are how an agent gets work done without a user typing into chat: data refreshes, health checks, daily reports, monitoring. They live on the **platform**, not in the bundle — configured per-install via *Agent Config → Schedules*. Bundle revisions do not include schedules, and `apply-update` does not sync them.
+### Two schedule types — pick the right one
+| Type | When the script runs | When a session is created (tokens spent) | Use it for |
+|---|---|---|---|
+| **`script_trigger`** | Every tick | **Only when the script does NOT print `OK`** | Routine monitoring, idempotent checks, "ping the system, only escalate on anomalies". This is the default for any predefined workflow. |
+| **`static_prompt`** | Never directly — a session is started every tick with a prompt | **Every tick** | Inherently conversational/agentic work that needs an LLM each time (e.g., "produce today's narrative summary"). |
+**Strong default: prefer `script_trigger` unless you genuinely need an LLM every tick.** It costs zero tokens on quiet days and only pulls the agent in when there's something to react to.
+### The "OK" contract for `script_trigger`
+The platform decides whether to start a session by inspecting your script's exit code + stdout:
+- **Silent success** (no session, no tokens, just a log entry):
+  `exit code == 0` **AND** `stdout.strip() == "OK"`
+- **Escalation** (session created, agent receives the context message):
+  any other combination — non-zero exit, empty stdout, or any stdout that isn't literally `OK`.
+- `stderr` is informational: it's surfaced to the agent if a session is created, but `OK` on stdout with stderr present still counts as OK.
+- `stdout` is truncated at 10,000 chars before comparison and before being passed to the agent (`[output truncated]` marker appended).
+- Default command timeout is 120 s (max 300 s); timeouts / network errors are logged as `error` and do **not** start a session.
+- Working directory inside the container: `/app/workspace/`.
+- Minimum schedule interval: **30 minutes**. "every 5 minutes" is rejected.
+### How to invoke the script from the schedule
+**Preferred command form:**
+```
+uv run python /app/workspace/scripts/<name>.py
+```
+— **not** `uv run scripts/<name>.py` or `uv run python scripts/<name>.py`.
+Two reasons this is the safe default:
+1. **`uv run python …` keeps `uv` quiet about your script.** Pointing `uv run` at a `.py` argument can make `uv` treat it as a uv-managed script and emit informational lines on stdout (resolving deps, installing packages, syncing the venv). On a `script_trigger` schedule those extra lines mean `stdout.strip() != "OK"` and the platform opens a session every tick. Routing through `python` makes `uv` only manage the interpreter — your script's stdout is the only thing on stdout.
+2. **Absolute path = no cwd surprises.** Schedules run with cwd `/app/workspace/`, but using the absolute path removes any assumption about it and makes the command identical to what you smoke-tested via `cinna exec`.
+If `uv` does still print first-run noise (e.g., venv creation on a brand new env), invoke once manually with `cinna exec uv run python /app/workspace/scripts/<name>.py` to warm the venv before configuring the schedule.
+### `script_trigger` script template
+Put the script in `scripts/checks/` (or `scripts/monitors/`) so its role is obvious. The script must be deterministic about its output: print `OK` on a clean line and exit 0 when there's nothing to do; print actionable context and exit non-zero otherwise.
+```python
+#!/usr/bin/env python3
+"""Daily data freshness check — run on a script_trigger schedule.
+Exit semantics (script_trigger contract):
+- prints exactly "OK" and exits 0 → no session, no tokens
+- prints anything else or exits non-zero → session opened, output handed to the agent
+"""
+from __future__ import annotations
+import json
+import sys
+from datetime import datetime, timezone
+from pathlib import Path
+WORKSPACE = Path("/app/workspace")
+STATE = WORKSPACE / "app-data" / "storage" / "checks" / "data_freshness.json"
+def main() -> int:
+    try:
+        snapshot = load_last_snapshot()
+        issues = check_freshness(snapshot)
+    except Exception as exc:                       # never let an exception bubble silently
+        print(f"FAIL: data_freshness check crashed: {exc}", file=sys.stderr)
+        return 2
+    if not issues:
+        record_run(status="ok")
+        print("OK")                                # ← exact string the platform looks for
+        return 0
+    record_run(status="alert", issues=issues)
+    print("Stale data detected:")
+    for issue in issues:
+        print(f"  - {issue}")
+    print()
+    print("Please refresh the cache (`uv run python /app/workspace/scripts/cache/update.py`) "
+          "and confirm whether the alert was real.")
+    return 1                                        # non-zero → session opened
+def load_last_snapshot() -> dict:
+    if not STATE.exists():
+        return {}
+    return json.loads(STATE.read_text())
+def check_freshness(snapshot: dict) -> list[str]:
+    # Pure business logic — returns a list of issue strings (empty = healthy).
+    ...
+def record_run(*, status: str, issues: list[str] | None = None) -> None:
+    STATE.parent.mkdir(parents=True, exist_ok=True)
+    STATE.write_text(json.dumps({
+        "status": status,
+        "issues": issues or [],
+        "ran_at": datetime.now(timezone.utc).isoformat(),
+    }, indent=2))
+if __name__ == "__main__":
+    sys.exit(main())
+```
+Conventions baked into the template:
+- **Persist run state under `app-data/storage/`.** It survives bundle updates and reinstalls, and lets the next tick compare against last tick.
+- **Single `OK` line, nothing else, on healthy runs.** Don't print a banner, version, or timestamp — the platform compares the whole stripped stdout to the literal string `OK`.
+- **On alert: print human-readable context AND end with a directive.** The output is dropped straight into a new session as context; tell the agent what you expect it to do next ("refresh the cache, confirm the alert").
+- **Exit codes:** `0` = healthy, non-zero = "look at this". Reserve `2` for crashes/unknowns so you can tell crashes apart from real alerts in the logs.
+- **No prints on the healthy path other than `OK`.** Logs go to `stderr` if you must, or to a file under `app-data/storage/`.
+### What the agent receives when the script escalates
+The session is opened with a context message that includes the schedule name, the exact command, the timestamp, the exit code, stdout, stderr, and the prompt *"Please review the output above and take appropriate action."* That means the script's stdout effectively becomes the prompt — write it for the agent, not for a human operator.
+If the capability needs more than a few lines of agent guidance, point the agent at a doc in `docs/` from the script's output:
+```
+Stale data detected (2 records).
+See docs/data_recovery.md for the recovery workflow.
+```
+…and document the workflow in `docs/data_recovery.md` exactly like a regular capability (§4).
+### `static_prompt` schedules
+When you do want an LLM every tick (e.g., daily narrative reports, weekly summaries):
+- Configure on the platform UI; provide a per-schedule prompt (overrides the agent's entrypoint prompt).
+- The prompt should reference an existing capability — e.g., *"Produce today's market summary using scripts/reports/daily_summary.py"* — so the agent stays grounded in the bundle's scripts and docs.
+- Token cost is paid every tick. Choose intervals accordingly.
+### Idempotency and safety
+Recurring scripts will run forever — assume any execution can happen at any time:
+- **Idempotent by construction.** Re-running the script back-to-back must produce the same outcome and not create duplicate records / emails / tickets. Guard side-effects with a "have I already done this today?" check against `app-data/storage/`.
+- **No incremental cache mutation.** Same rule as §5 — caches under `app-data/cache/` must be fully rebuildable from scratch.
+- **Bounded work.** A scheduled script that grows linearly with the dataset will eventually time out. Page deterministically (§5) and write progress to `app-data/storage/` so the next tick can resume if needed.
+- **Test before scheduling.** Run the script manually first: `cinna exec python scripts/checks/<name>.py` — confirm `OK` on the healthy path and useful context on the alert path. Only then configure the schedule in the UI.
+- **Schedules are NOT in the bundle.** When you publish a new revision, schedules don't ship. Document the recommended schedules in the agent's `README.md` so installers can recreate them; the *scripts* are the bundle-shippable part.
+### Build-a-scheduled-job — concrete steps
+1. Decide: routine check / heartbeat → `script_trigger`. Always-on summary or narrative → `static_prompt`.
+2. Create the script under `scripts/checks/<name>.py` (or `scripts/monitors/`, `scripts/reports/` — pick a folder that signals intent).
+3. Persist last-run state under `app-data/storage/<name>/`.
+4. Make the script print `OK` and exit 0 on the healthy path; print actionable context and exit non-zero on alerts (see template above).
+5. Smoke-test both paths: `cinna exec python scripts/checks/<name>.py` healthy → exit 0 + `OK`; force an alert → non-zero + context.
+6. Document the script in `scripts/README.md` and the recommended schedule (cadence + suggested name) in the agent's `README.md`.
+7. On the platform: Agent Config → Schedules → **New → Script Trigger**, paste the command using the canonical form `uv run python /app/workspace/scripts/checks/<name>.py` (absolute path, `python` between `uv run` and the script — see "How to invoke the script from the schedule" above) and the natural-language cadence ("every workday at 7 AM"). Generate → Save.
+8. Check Schedules → *Logs* (clock icon) after the first tick — confirm `success` for `OK`, `session_triggered` for alerts, and that no `error` rows appear.
+> If the scheduled job's purpose is to *email a summary or report to users*, do not generate or send the email from the script directly with hand-rolled `smtplib` calls and free-form content. Use the template-based pattern in §13 — template generator → sender → audit log, with SMTP creds from the platform credential store.
+---
+## 13. External notifications — email
+When an agent needs to communicate **outside** the platform (daily/weekly summaries, alerts, reports to stakeholders), email is the standard channel. The pattern below is deliberately split into three concerns — credentials, content, transport — and each piece has rules the agent must follow if it's going to run unattended for months.
+### Credentials: always via the platform
+SMTP credentials (host, port, username, password, from-address) belong in the platform's **service credentials** store, shared with the agent via *Credentials → New → SMTP* (or the appropriate service type). The agent reads them through the runtime's standard credential interface inside the container — same mechanism as any other integration.
+**Never** put SMTP credentials in:
+- `workspace/files/`, `workspace/scripts/`, or any other bundle-owned folder (they'd ship to every installer)
+- `app-data/storage/` (per-user state, but still plaintext on the filesystem)
+- environment variables baked into the image
+- a script as a string literal — even temporarily, even commented out
+The platform-managed `credentials/` folder is the only acceptable surface.
+### Content: template-first, never free-form
+Notifications must look the same every time they go out. The reliable way to guarantee that is a **two-script pipeline**: one script renders, another sends. Templates live in the bundle; rendered output lives in `app-data/storage/`.
+```
+workspace/
+├── files/
+│   └── templates/
+│       └── email/
+│           ├── daily_summary.html.j2     # bundle-owned: shape, headers, CSS
+│           ├── daily_summary.txt.j2      # bundle-owned: plain-text fallback
+│           └── partials/                 # shared headers/footers
+├── scripts/
+│   └── notifications/
+│       ├── build_daily_summary.py        # 1️⃣ render — data → outbox/
+│       ├── send_outbox.py                # 2️⃣ send — outbox/ → SMTP, write log
+│       └── notifications_db.py           # shared SQLite helper
+└── app-data/
+    └── storage/
+        └── notifications/
+            ├── outbox/<message_id>/
+            │   ├── meta.json             # to, subject, template, content_hash
+            │   ├── body.html
+            │   └── body.txt
+            └── sent.db                   # SQLite audit log
+```
+**1️⃣ Build script** (`scripts/notifications/build_daily_summary.py`)
+- Reads source data from `app-data/storage/` (results of the work script that ran earlier in the cron).
+- Renders Jinja2 templates from `files/templates/email/` into a deterministic outbox folder keyed by `message_id` (e.g., `daily-summary-2026-05-13`).
+- Writes `body.html`, `body.txt`, and a `meta.json` (recipient list, subject, template name, sha256 of the rendered HTML).
+- **Does not send.** Pure render + filesystem write. Idempotent: running twice on the same data produces the same outbox folder.
+**2️⃣ Send script** (`scripts/notifications/send_outbox.py`)
+- Lists every `app-data/storage/notifications/outbox/<id>/` not yet marked `sent` in `sent.db`.
+- Loads SMTP creds from the platform credential store.
+- For each outbox item: open SMTP connection, send the message, write a `sent.db` row (`status='sent'`, `sent_at`, content hash, recipient, subject).
+- On transient SMTP failure: write `status='failed'` with the exception, **do not delete the outbox folder** — the next tick retries.
+- On permanent failure (auth, DNS): log and stop; surface via the schedule's normal escalation path (non-OK exit).
+- Print `OK` on a clean run with nothing to send, or after every queued message lands successfully. Exit non-zero only when something needs human attention.
+### Transport: idempotency and the audit log
+Notifications run forever — treat them like a payment system in miniature.
+- **Deterministic `message_id`.** Use a slug + the period it covers: `daily-summary-2026-05-13`, `incident-report-2026-05-13-WK19`. The send script must check `sent.db` before sending and skip anything already `status='sent'` for that `message_id`. This is what makes "the cron fired twice by accident" safe.
+- **Audit log (SQLite under `app-data/storage/`).** Every attempt — successful or not — writes a row. Suggested schema:
+  ```sql
+  CREATE TABLE IF NOT EXISTS notifications_sent (
+    id            INTEGER PRIMARY KEY AUTOINCREMENT,
+    message_id    TEXT NOT NULL,                  -- deterministic per logical message
+    template      TEXT NOT NULL,                  -- daily_summary, incident_report, …
+    recipient     TEXT NOT NULL,                  -- single row per recipient
+    subject       TEXT NOT NULL,
+    content_hash  TEXT NOT NULL,                  -- sha256 of body.html
+    status        TEXT NOT NULL,                  -- queued | sent | failed | skipped
+    error         TEXT,
+    queued_at     TEXT NOT NULL,
+    sent_at       TEXT,
+    attempt       INTEGER NOT NULL DEFAULT 1,
+    UNIQUE (message_id, recipient)                -- prevents double-send
+  );
+  ```
+- **The DB lives in `app-data/storage/notifications/sent.db`** — per-user, survives bundle updates, never shipped in revisions. Add an `inspect_notifications.py` helper script so an operator can query the log without opening sqlite3.
+- **Retention.** Decide on a retention window (e.g., 1 year). A separate `prune_notifications.py` script can vacuum old rows on a slow schedule.
+- **Outbox is the source of truth for *what was sent*.** Don't delete outbox folders immediately after sending — keep them at least one cycle so failures can be inspected. A periodic prune can remove anything older than N days and matching a `sent` row.
+### LLM-generated content — never the whole email
+The temptation is "let the LLM write today's summary email and send it." Don't. Free-form LLM output drifts on every dimension that matters for a recurring notification — tone, length, section ordering, HTML well-formedness, subject lines, even occasional refusals or boilerplate apologies. After 90 days you have 90 different-looking emails.
+The right pattern when the body genuinely needs LLM output is **template + slots**:
+1. **Fixed template** (`daily_summary.html.j2`) defines all structural elements: greeting, section headers, table headers, footer, unsubscribe line. The LLM never sees this file.
+2. **Slot extraction script** asks the LLM for **structured JSON** populating only specific slots:
+   ```json
+   {
+     "headline":   "Q1 revenue closed +12% YoY",
+     "highlights": ["Top channel: direct (+18%)", "Churn down 0.4pp"],
+     "risk_note":  "Mobile conversion dipped in week 19 — see attached report."
+   }
+   ```
+   Validate the JSON against a schema before continuing. If it fails validation, retry once; if it still fails, abort and emit a non-OK status so the schedule escalates to a session — never send half-rendered email.
+3. **The build script renders the template** with that JSON as context. The LLM cannot affect structure, headers, or HTML; it can only fill named slots.
+In other words: LLM produces *data*, your template produces *the email*. Same rule, said differently: HTML and Subject must never be generated tokens from the model.
+### Common scenario — scheduled summary email
+```
+cron tick (script_trigger)
+  └─ scripts/work/collect_daily_data.py        — fetch / process, write to app-data/storage/daily/
+        └─ scripts/notifications/build_daily_summary.py   — render templates → outbox/<id>/
+              └─ scripts/notifications/send_outbox.py     — SMTP send + sent.db row + print OK
+```
+All four steps can be one script that calls helpers, or chained as separate `script_trigger` schedules at staggered times. Whichever way, the rules above (creds from platform, templates not LLM, audit log, idempotent message_id) apply unchanged.
+### Build-an-email-notification — concrete steps
+1. **Decide the cadence and the message.** What gets emailed, to whom, how often? Pick a stable `message_id` slug.
+2. **Get SMTP creds onto the platform** under *Credentials* with the service type expected by the runtime. Confirm the agent can read them inside the container.
+3. **Create the templates** under `workspace/files/templates/email/<name>.{html.j2,txt.j2}`. Lock down the structure now — assume it changes only via deliberate template edits, not via per-run drift.
+4. **Write the build script** under `scripts/notifications/build_<name>.py` — pure render to `app-data/storage/notifications/outbox/<message_id>/`.
+5. **Write the send script** (or reuse a generic `send_outbox.py`) — SMTP send, `sent.db` row, idempotency check.
+6. **Smoke-test offline:** run the build script via `cinna exec`, inspect the rendered `body.html` and `meta.json` under `app-data/storage/notifications/outbox/`. Only then run the send script.
+7. **Smoke-test send to yourself first.** Recipient list = your own email until you've confirmed deliverability, formatting, and that the audit row lands correctly.
+8. **Hook it into a `script_trigger` schedule** (see §12). The schedule's command runs the work + build + send pipeline and prints `OK` when there's nothing to escalate.
+9. **Document the recipient list and cadence** in the agent's `README.md` — schedules don't ship in bundles, so installers need to recreate them.
+---
+## 14. Pre-publish checklist
+- [ ] All scripts runnable via `cinna exec`; no hidden cwd assumptions.
+- [ ] `scripts/README.md`, `WORKFLOW_PROMPT.md`, and the root README are in sync with current scripts.
+- [ ] No secrets anywhere under `workspace/` (`grep -ri 'token\|secret\|password\|api_key' workspace/`).
+- [ ] No runtime output committed under bundle-owned folders — only under `app-data/`.
+- [ ] Caches rebuild cleanly from empty: `rm -rf workspace/app-data/cache/*` then re-run cache-update.
+- [ ] `knowledge_query` returns expected hits for representative questions.
+- [ ] Prompts reviewed end-to-end on a fresh conversation.
+- [ ] For each scheduled `script_trigger`: healthy path prints exactly `OK` (verified via `cinna exec`).
+- [ ] For each email notification: SMTP creds resolved from platform; templates render; `sent.db` row written; idempotency check skips re-sends.
+- [ ] `cinna sync status` clean.
+---
+## 15. Common pitfalls
+- **`ModuleNotFoundError` after `pip install`** — You installed to system Python, not the venv. Use `cinna exec uv pip install <pkg>`.
+- **`No such file or directory: pip`** — The venv doesn't have a pip binary. Use `uv pip install` instead.
+- **Script works on host but fails in container** — Different Python version and packages. Always test with `cinna exec`.
+- **Package disappears after rebuild** — Add it to `workspace_requirements.txt`.
+- **Writing to `uploads/` from a script.** That folder is the runtime's inbox. Treat it as read-only.
+- **Storing runtime output in `knowledge/` or `files/`.** Those are bundle-owned; you'll either ship dev data to every user, or lose it on bundle update. Put runtime output in `app-data/storage/`.
+- **Caching in `scripts/`.** Same problem — bundles ship that data. Use `app-data/cache/`.
+- **Unsorted pagination.** See §5. Always `order="id ASC"` or equivalent.
+- **In-place cache patching.** Cache must be regeneratable from scratch. Replace, don't patch.
+- **Long monolithic `WORKFLOW_PROMPT.md`.** Split capabilities into `docs/<capability>.md` and reference them.
+- **Wrong "OK" string in a `script_trigger` script.** The platform compares `stdout.strip()` to the literal `OK`. `Ok`, `ok`, `OK ✅`, `Status: OK`, or even an empty stdout all count as NOT OK and will open a session every tick — silently burning tokens until someone notices. Print exactly `OK` on the healthy path.
+- **`uv run scripts/foo.py` in a schedule command.** `uv run` with a `.py` argument can print dependency-resolution / install lines on stdout, polluting your `OK` and triggering a session every tick. Always use `uv run python /app/workspace/scripts/foo.py` for scheduled commands — absolute path, `python` between `uv run` and the script.
+- **Using `static_prompt` where `script_trigger` would do.** `static_prompt` spends tokens on every tick whether or not there's anything to do. Default to `script_trigger` and only switch when you genuinely need an LLM each time.
+- **LLM-authored email bodies.** Letting the model write a notification email end-to-end (subject + HTML + plain text) makes every send look different — tone drifts, sections reorder, HTML breaks, occasional refusals slip through. Use a fixed template and pass the LLM only as a slot filler returning structured JSON (§13).
+- **SMTP creds in scripts or env vars.** They must come from the platform service-credentials store. Anything in `workspace/` (even commented out) is a leak waiting to ship in the next bundle revision.
+- **No audit log for outbound notifications.** Without a `sent.db` row per attempt you can't tell whether silence means "nothing to report" or "SMTP has been broken for two weeks." Every send — success or failure — writes a row (§13).
+- **Non-deterministic `message_id`.** Using a timestamp like `now().isoformat()` for the message id defeats idempotency — a re-run sends the same email twice. Derive the id from the period the message covers (`daily-summary-2026-05-13`), not from wall-clock at send time.
+---
+## 16. MCP Tools
+The following tools are available via MCP (auto-configured in `.mcp.json`):
+{mcp_tools_section}

cinna/templates/__init__.py ADDED Viewed

File without changes