npm - sisyphi - Versions diffs - 1.2.2 → 1.2.12 - Mend

sisyphi 1.2.2 → 1.2.12

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (85) hide show

package/dist/templates/orchestrator-plugin/skills/orchestration/workflow-examples.md DELETED Viewed

@@ -1,428 +0,0 @@
-# Workflow Examples
-End-to-end examples showing how the orchestrator structures cycles for real scenarios.
-### Path conventions in these examples
-Plan files live under per-plan-lead subdirectories: `context/{plan-lead-agent-id}/plan-*.md`. These examples elide the subdir (showing `context/plan-rate-limiting.md`) for readability. In a real cycle, the orchestrator reads the exact path from the plan lead's submission report and carries it verbatim into downstream implement, review-plan, and validate agent prompts.
----
-## Example 4: Wrapper-Shaped Config Migration (LOW effort — 5 files, mechanical)
-**Starting task**: "All config access goes through `process.env` directly — migrate to a `getConfig()` wrapper already defined in `src/config.ts`"
-**Effort tier**: LOW. Every change is a call-site swap onto an existing handler. No new behavior.
-### Cycle 1 — Plan
-```
-roadmap.md:
-  ## Refactor: Migrate env access to getConfig()
-  - [ ] Plan migration — enumerate all process.env call sites
-  - [ ] Update call sites to use getConfig()
-  - [ ] Validate — no direct process.env access remains; tests pass
-Agents spawned:
-  plan agent → "Enumerate every direct process.env access in src/. Map each call site
-    to the matching getConfig() key. Output a migration checklist. Files expected:
-    src/api/server.ts, src/db/connection.ts, src/queue/worker.ts,
-    src/cli/commands/start.ts, src/config.ts (source of truth — do not modify)."
-```
-### Cycle 2 — Implement
-```
-Plan complete. 23 call sites across 4 files.
-Agents spawned:
-  implement agent → "Execute migration plan at context/{plan-agent-id}/plan-config-migration.md.
-    Replace every process.env.X access with getConfig('X'). Do not modify src/config.ts.
-    Do not add error handling — getConfig() already throws on missing keys."
-```
-### Cycle 3 — Validate + complete
-```
-Implementation complete.
-Agents spawned:
-  validate agent → "Verify migration: grep for remaining process.env access in src/ (excluding
-    src/config.ts). Run existing tests. Confirm zero direct env reads outside config.ts."
-Validation: PASS. Complete — "All env access routed through getConfig()."
-```
-**Pipeline shape**: `plan → implement → validate`. 3 cycles. No `sisyphus:spec`, no `sisyphus:test-spec`, no `sisyphus:review-plan`.
----
-## Example 5: New Subsystem — Distributed Task Queue (HIGH effort)
-**Starting task**: "Add a persistent task queue so long-running jobs survive server restarts. Include test coverage of the survival, retry, and concurrency invariants."
-**Effort tier**: HIGH. New subsystem, new protocol (worker ↔ queue contract), cross-domain orchestration (API + storage + worker process). The prompt explicitly asks for test coverage — `sisyphus:test-spec` is justified at Cycle 2.
-### Cycle 0 — Problem exploration
-```
-roadmap.md:
-  ## Feature: Persistent Task Queue
-  - [ ] Explore current job execution patterns and constraints
-  - [ ] Spec — requirements + architecture
-  - [ ] Plan implementation (staged outline)
-  - [ ] Spec behavioral properties (test-spec) — user asked for tests in the prompt
-  ...
-Agents spawned:
-  explore agent → "Map current job execution in src/jobs/. Identify what needs to survive
-    restarts, current storage backends, worker process lifecycle."
-  problem agent → "Explore design space for persistent task queue. Questions: push vs pull
-    worker model, at-least-once vs exactly-once semantics, failure/retry policy, storage
-    backend options (Redis, Postgres, SQLite)."
-```
-### Cycle 1 — Spec (human iterates)
-```
-Agents spawned:
-  sisyphus:spec → "Run spec session for persistent task queue.
-    Context in context/problem-task-queue.md and context/explore-task-queue.md."
-Human iterates. Spec outputs:
-  context/requirements-task-queue.md — acceptance criteria, failure semantics
-  context/design-task-queue.md — Redis-backed queue, pull workers, at-least-once delivery
-```
-### Cycle 2 — High-level plan + test-spec (parallel)
-```
-Agents spawned (parallel):
-  plan agent → "Create high-level stage outline from context/requirements-task-queue.md
-    and context/design-task-queue.md. Stages: (1) queue storage layer, (2) producer API,
-    (3) worker consumer, (4) integration + retry logic. Cycle estimates per stage."
-  test-spec agent → "Define behavioral properties: job survives server restart, failed
-    jobs retry up to N times, concurrent workers don't double-execute the same job."
-```
-If the original prompt had been silent on tests, the test-spec spawn would be omitted and Cycle 2 would be plan-only — Cycle 3 would then proceed straight to detail-planning stage 1.
-### Cycles 3–9 — Staged implementation with critique + validation checkpoints
-```
-Follows Feature Build Large pattern:
-  Cycle 3: detail-plan stage 1 + implement stage 1
-  Cycle 4: implement stage 2; detail-plan stage 3 in parallel
-  Cycle 5: critique stages 1-2 (foundation review before worker builds on it)
-  Cycle 6: address critique + implement stage 3
-  Cycle 7: implement stage 4 (integration + retry); validate stages 3-4
-  Cycle 8: sis orch yield --mode validation — e2e: enqueue job, kill server, restart,
-    confirm job ran exactly once
-  Cycle 9: final review agent; complete
-```
-**Pipeline shape**: Full HIGH pipeline — `problem → spec → plan (+ test-spec because the prompt asked for tests) → staged implement → critique → validate → review`. 9+ cycles. Without an explicit test request in the prompt, the parallel `test-spec` would be omitted and Cycle 2 would be plan-only.
----
-## Example 1: Fix a Race Condition in WebSocket Reconnection
-**Starting task**: "WebSocket connections sometimes drop messages during reconnection"
-### Cycle 1 — Diagnosis
-```
-roadmap.md:
-  ## Bug Fix: WebSocket message loss during reconnection
-  - [ ] Diagnose message loss during WebSocket reconnection
-  - [ ] Fix root cause
-  - [ ] Validate fix under concurrent load
-  - [ ] Review fix
-Agents spawned:
-  debug agent → "Diagnose why WebSocket messages are lost during reconnection.
-    Focus on src/ws/client.ts and src/ws/reconnect.ts.
-    Check for race conditions between disconnect handler and message queue.
-    Look at git blame for recent changes to reconnection logic."
-```
-### Cycle 2 — Read diagnosis, plan fix
-```
-Agent report: "Root cause: reconnect() clears the message queue before the new socket
-  is ready. Messages sent between disconnect and reconnect are dropped.
-  Line 47 of src/ws/reconnect.ts — queue.clear() is called in onDisconnect
-  but should be deferred until onReconnect confirms the new socket is live.
-  Confidence: High."
-roadmap.md updated:
-  - [x] ~~Diagnose message loss during WebSocket reconnection~~
-  - [ ] Fix root cause — queue.clear() at L47 of reconnect.ts runs too early
-  - [ ] Validate fix under concurrent load
-  - [ ] Review fix
-Agents spawned:
-  implement agent → "Fix race condition in src/ws/reconnect.ts.
-    Root cause: queue.clear() at line 47 runs on disconnect, dropping pending messages.
-    Fix: defer queue drain until reconnect confirms new socket. Buffer messages during
-    reconnection window. See existing buffer pattern in src/ws/buffer.ts."
-```
-### Cycle 3 — Validate
-```
-Agent report: "Fixed. Moved queue.clear() to onReconnect callback after socket.readyState
-  confirms OPEN. Added message buffer during reconnection window. 2 files changed."
-roadmap.md updated:
-  - [x] ~~Diagnose message loss~~
-  - [x] ~~Fix root cause~~
-  - [ ] Validate fix under concurrent load
-  - [ ] Review fix
-Agents spawned:
-  validate agent → "Verify WebSocket reconnection fix. Run existing tests in
-    src/__tests__/ws/. Manually verify: disconnect simulation should not lose queued
-    messages. Check that buffer drains correctly on reconnect."
-```
-### Cycle 4 — Review and complete
-```
-Validation: PASS. Existing tests pass, reconnection preserves messages.
-roadmap.md updated:
-  - [x] ~~Diagnose~~, [x] ~~Fix~~, [x] ~~Validate~~
-  - [ ] Review fix
-Agents spawned:
-  review agent → "Review changes in src/ws/reconnect.ts and src/ws/buffer.ts.
-    Focus on: thread safety of the new buffer, memory leak potential if reconnect
-    never succeeds, edge case of rapid disconnect/reconnect cycles."
-```
-### Cycle 5 — Complete
-```
-Review: Clean. One medium finding — buffer should have a max size to prevent
-  memory issues if reconnect fails indefinitely. Not blocking.
-All roadmap.md items complete.
-Action: complete — "Fixed WebSocket message loss during reconnection. Messages now
-  buffered during reconnection window and drained on successful reconnect."
-```
----
-## Example 2: Add API Rate Limiting
-**Starting task**: "Add rate limiting to the REST API — per-user, configurable limits, with tests for the limit-enforcement and 429 response behavior"
-### Cycle 1 — Problem exploration
-```
-roadmap.md:
-  ## Feature: API Rate Limiting
-  ### Requirements & Design
-  - [ ] Problem exploration — understand rate limiting needs
-  - [ ] Requirements — define acceptance criteria
-  - [ ] Design — architecture for rate limiting
-  - [ ] Plan implementation
-  - [ ] Review plan
-  ### Implementation
-  - [ ] Implement rate limiting middleware
-  - [ ] Implement rate limit configuration
-  - [ ] Implement rate limit headers and error responses
-  ### Validation
-  - [ ] Validate implementation
-  - [ ] Review implementation
-Agents spawned:
-  problem agent → "Explore the codebase and understand the API rate limiting landscape.
-    Check existing middleware patterns in src/api/middleware/.
-    Questions to explore: current request handling, existing auth/middleware chain,
-    what storage backends are available (Redis?), user identification mechanisms."
-```
-### Cycle 2 — Spec (after human iterates on problem)
-```
-Agent report: "Problem document saved to context/problem-rate-limiting.md.
-  Current middleware chain uses Express middleware pattern. Redis is already in stack.
-  Users are identified by JWT sub claim. No existing rate limiting."
-roadmap.md updated:
-  - [x] ~~Problem exploration~~
-  - [ ] Spec — define acceptance criteria and architecture
-  ...
-Agents spawned:
-  sisyphus:spec → "Run a spec session for per-user API rate limiting. Read context/problem-rate-limiting.md for context."
-Later report: "Spec completed.
-  Requirements saved to context/requirements-rate-limiting.md.
-  Design saved to context/design-rate-limiting.md.
-  Covers: per-user limits, endpoint-specific overrides, 429 response format,
-  Retry-After headers, and a Redis-backed sliding window approach."
-```
-### Cycle 3 — Plan (after human reviews spec)
-```
-Agent report: "Spec outputs approved.
-  Approach: Redis-backed sliding window middleware. Per-user with endpoint-specific
-  overrides. Standard 429 response with Retry-After header. Config via environment variables."
-roadmap.md updated:
-  - [x] ~~Problem exploration~~, [x] ~~Spec~~
-  - [ ] Plan implementation
-  ...
-Agents spawned:
-  plan agent → "Create implementation plan from context/requirements-rate-limiting.md
-    and context/design-rate-limiting.md"
-  test-spec agent → "Define behavioral properties for rate limiting from
-    context/requirements-rate-limiting.md"
-```
-### Cycle 4 — Review plan
-```
-Both agents complete. Plan at context/plan-rate-limiting.md.
-Plan has 3 phases: middleware, config, response format.
-Agents spawned:
-  review-plan agent → "Validate plan at context/plan-rate-limiting.md
-    against context/requirements-rate-limiting.md and context/design-rate-limiting.md"
-```
-### Cycle 5 — Implement phases 1+2 (parallel, low-risk foundation)
-```
-Plan review: PASS.
-roadmap.md updated (plan review done, starting implementation):
-  - [x] ~~Spec~~, [x] ~~Plan~~, [x] ~~Review plan~~
-  - [ ] Implement rate limiting middleware
-  - [ ] Implement rate limit configuration
-  - [ ] Critique phases 1-2 — review before integration phase
-  - [ ] Implement rate limit headers and error responses
-  - [ ] Validate — smoketest rate limiting end-to-end
-  - [ ] Final review
-Agents spawned (parallel — phases touch different files):
-  implement agent → "Implement Phase 1 from context/plan-rate-limiting.md —
-    rate limiting middleware in src/api/middleware/rate-limit.ts"
-  implement agent → "Implement Phase 2 from context/plan-rate-limiting.md —
-    rate limit configuration in src/config/rate-limits.ts"
-```
-### Cycle 6 — Critique before integration builds on top
-```
-Both implementation agents complete.
-Why critique now: Phase 3 (headers/error responses) integrates the middleware and
-  config — if the foundation has issues, they'll cascade. Cheaper to catch now.
-roadmap.md updated:
-  - [x] ~~Implement middleware~~, [x] ~~Implement config~~
-  - [ ] Critique phases 1-2
-  ...
-Agents spawned:
-  review agent → "Review rate limiting middleware and config implementation.
-    Focus on: Redis connection handling, sliding window correctness,
-    config schema matches what middleware expects."
-```
-### Cycle 7 — Implement phase 3 + address critique
-```
-Review: 2 findings — middleware doesn't handle Redis connection failure gracefully,
-  config schema allows negative rate limits.
-Agents spawned (parallel):
-  implement agent → "Fix review findings in reports/agent-008-final.md for
-    rate limiting middleware and config."
-  implement agent → "Implement Phase 3 from context/plan-rate-limiting.md —
-    rate limit headers and 429 error responses in src/api/middleware/rate-limit.ts"
-```
-### Cycle 8 — Validate end-to-end
-```
-Phase 3 and fixes complete.
-Why validate now: all three phases are done and integrated. This is the checkpoint
-  before calling it complete — verify it actually works, not just compiles.
-Agents spawned:
-  validate agent → "Verify rate limiting end-to-end: start server, send requests
-    exceeding limits, confirm 429 responses with correct Retry-After headers.
-    Test per-user isolation, endpoint-specific overrides, Redis failover behavior."
-```
-### Cycle 10 — Complete
-```
-Validation: PASS. Final review agent confirms no issues.
-Complete — "Added per-user API rate limiting with Redis-backed sliding window,
-  configurable per-endpoint limits, and graceful Redis failover."
-```
----
-## Example 3: Refactor Authentication Module
-**Starting task**: "Refactor auth — extract token logic from route handlers into dedicated service"
-### Cycle 1 — Plan + baseline
-```
-roadmap.md:
-  ## Refactor: Extract Token Service
-  - [ ] Plan auth refactor — extract token service
-  - [ ] Capture behavioral baseline (run all auth tests)
-  - [ ] Create TokenService class with extracted logic
-  - [ ] Update route handlers to use TokenService
-  - [ ] Update tests to use new service interface
-  - [ ] Validate all auth tests still pass
-  - [ ] Review for dead code and missed references
-Agents spawned (parallel):
-  plan agent → "Plan refactor: extract token creation, validation, and refresh
-    logic from src/api/routes/auth.ts into a new src/services/token-service.ts.
-    Map all token-related functions, their callers, and the extraction plan."
-  validate agent → "Run all tests in src/__tests__/auth/ and record results.
-    This is the behavioral baseline — these must all pass after refactor."
-```
-### Cycle 2 — Extract (serial — must happen before consumer updates)
-```
-Plan complete, baseline captured (47 tests passing).
-roadmap.md updated:
-  - [x] ~~Plan auth refactor~~
-  - [x] ~~Capture behavioral baseline~~ (47 tests passing)
-  - [ ] Create TokenService class with extracted logic
-  ...
-Agents spawned:
-  implement agent → "Execute Phase 1 of refactor plan: create TokenService class
-    at src/services/token-service.ts. Extract validateToken, createToken, refreshToken
-    from src/api/routes/auth.ts. Export the class. Do NOT modify route handlers yet."
-```
-### Cycle 3 — Update consumers (parallel where possible)
-```
-TokenService created.
-Agents spawned:
-  implement agent → "Update route handlers in src/api/routes/auth.ts to import
-    and use TokenService instead of inline token logic. Remove extracted functions."
-  implement agent → "Update tests in src/__tests__/auth/ to use TokenService
-    where they directly tested extracted functions."
-```
-### Cycle 4 — Validate + review
-```
-Agents spawned (parallel):
-  validate agent → "Run all auth tests. Compare against baseline of 47 passing.
-    Every test must still pass."
-  review agent → "Review src/api/routes/auth.ts and src/services/token-service.ts.
-    Check for: dead code left behind, missed references to old functions, broken imports."
-```
-### Cycle 5 — Complete
-```
-All 47 tests passing. Review clean.
-All roadmap.md items complete.
-Complete — "Extracted token logic into TokenService. All existing tests pass."
-```

package/templates/agent-plugin/skills/humanloop/SKILL.md DELETED Viewed

@@ -1,148 +0,0 @@
----
-name: humanloop
-description: >
-  Read before calling `sis ask`. Triggers when surfacing multiple questions or decisions to the user, presenting work for review/sign-off, or proposing concrete alternatives. Covers when a deck beats chat, how to design options as real forks the user can pick between, how to bundle related questions into one deck, and how to submit via the Bash tool's `run_in_background` so you can end your turn while the user takes their time answering.
----
-# Talking to the user via decks
-`sis ask` posts a structured deck of questions to the user's dashboard inbox. They walk through it on their own time and you read structured JSON back. Use it instead of dumping a wall of questions into chat.
-This skill covers **what to put in a deck** and **how to invoke it**. Run `sis ask -h` for the CLI shape (file path, `--session`, the `poll` and `peek` subcommands).
-## Reach for a deck when
-- You have **2+ questions** to surface in one beat (bundle them into one deck).
-- You're presenting **work for review or sign-off** (a design, a plan, a completion summary).
-- You're choosing between **concrete alternatives** the user must pick.
-- The work will sit while the user thinks. Decks survive across cycles; chat does not.
-## Skip the deck when
-- It's a single, low-stakes question whose answer barely changes downstream work — just ask in chat.
-- You can settle the question yourself by reading code or running a tool. **Default to investigating before asking.**
-- The user is actively conversing with you — converting a live exchange into a deck adds friction.
-## How to invoke
-The CLI **always blocks** until the user resolves the deck (potentially 10+ minutes). Submit through the Bash tool with `run_in_background: true` and **end your turn**. Do not peek, poll, or output filler chat between submit and answer — the bash completion notification is the only signal you need; it will wake you with stdout ready to parse. Same pattern for orchestrator, sub-agents, and one-off Claude Code sessions.
-```
-Bash tool call:
-  command:           sis ask "$deck"
-  run_in_background: true
-```
-Stdout on completion is one line of JSON: `{responses: [{id, selectedOptionId?, freetext?}, ...], completedAt}`. Branch on each response by its interaction `id`.
-If you already hold an `askId` from a prior cycle (e.g. respawned mid-wait), `sis ask poll <askId>` blocks on it and `sis ask peek <askId>` returns status without blocking. Use these only for respawn-recovery — **never to monitor a deck you just submitted in the current turn**. See `sis ask -h`.
-## Designing interactions
-### Each option is a concrete path forward
-The user picks an option to commit to a direction. Each option should name a real path with its tradeoffs spelled out, grounded in *this* codebase. Sign-off decks branch differently per option ("looks good", "minor fixes", "moderate fixes", "scope rework" each route the orchestrator somewhere different). Decision decks present mutually exclusive directions with named consequences.
-<example type="good">
-```
-title: "Session store backend?"
-subtitle: "Auth needs persistent sessions across restarts"
-kind: decision
-options:
-  in-memory:  "In-memory map — simplest. Loses sessions on restart; single-process only."
-  redis:      "Redis — survives restart, supports horizontal scale. New ops dependency."
-  postgres:   "Reuse existing Postgres — no new infra; ~10ms read latency vs Redis ~1ms."
-  defer:      "Ship in-memory now, migrate later if scale becomes real."
-allowFreetext: true
-freetextLabel: "Different framing — describe it"
-```
-</example>
-<example type="bad">
-```
-title: "Happy with this design?"
-options:
-  1. Yes
-  2. No, start over
-  3. Maybe, with comments
-  4. (no option, just freetext)
-```
-"Happy?" names a feeling, not a fork. Options 3 and 4 both collapse to freetext, forcing the user to invent the actual decision. Rewrite as specific decisions about specific elements of the design.
-</example>
-### Use `allowFreetext: true` as a safety valve, not the primary input
-Freetext catches "anything else?" — opinions or context the options didn't anticipate. When freetext IS the answer you want, write a chat message instead.
-<example type="bad">
-```
-title: "Approve?"
-options:
-  1. Approve
-  2. Reject
-  3. Comment
-allowFreetext: true
-```
-A freetext form wearing option clothing. Either name what "reject" actually routes to (back to design? abandon? try a different framing?), or drop the deck and ask in chat.
-</example>
-### Bound option count to 2–4
-Above four, options become too granular for the user to weigh; below two, you've collapsed into a yes/no that's faster to ask in chat.
-### Ground options in what you've already gathered
-Each option label should reference specifics from the codebase, plan, or exploration you just did — file names, framework constraints, prior decisions. When you can't fill in specifics, investigate before asking.
-### One concern per interaction
-When two questions interact, give them separate `id` / `title` / `options` inside the same deck (see Bundling below). One interaction asks one thing.
-## `kind` — display hint
-| kind | use for |
-|---|---|
-| `decision` | fork in the road; user picks a path forward |
-| `validation` | sign-off on completed work |
-| `notify` | FYI; user acknowledges |
-| `context` | surfacing background that needs a response |
-| `error` | something went wrong; user picks a recovery |
-The dashboard uses `kind` for inbox icons and sort weight. Mis-tagging trains the user to ignore the icons. Pick the closest fit.
-## Bundling
-If you'd otherwise submit two decks in the same beat, merge them. One deck with multiple `interactions` is one context switch for the user; two decks is two.
-```bash
-deck="$SISYPHUS_SESSION_DIR/context/.ask-$(date +%s).json"
-cat > "$deck" <<'EOF'
-{
-  "title": "Phase 2 sign-off + follow-on decisions",
-  "interactions": [
-    {
-      "id": "approve-phase-2",
-      "title": "Phase 2 looks good?",
-      "kind": "validation",
-      "options": [...]
-    },
-    {
-      "id": "phase-3-scope",
-      "title": "Phase 3 scope?",
-      "kind": "decision",
-      "options": [...]
-    }
-  ]
-}
-EOF
-# Then invoke `sis ask "$deck"` via the Bash tool with run_in_background: true.
-# Each interaction returns its own selectedOptionId / freetext in output.responses[], indexed by id.
-```
-## Submission notes
-- The deck is validated at submit (precise errors — trust them).
-- `kind` is an enum: `notify` | `validation` | `decision` | `context` | `error`. No other values accepted (see the table above for which to pick).
-- `bodyPath` points at a markdown file instead of inlining the body in JSON. The path is resolved **relative to the deck JSON's directory** and must stay inside it (no `..`, no symlinks out, no absolute paths pointing elsewhere). Practical pattern: write the deck JSON next to its body file — e.g. both inside `$SISYPHUS_SESSION_DIR/context/` — and use a basename like `"completion-summary.md"`. Mutually exclusive with `body`.
-- On completion, stdout is one line of JSON: `{responses, completedAt}`. Parse `responses[]` and dispatch on each interaction's `id`.
-- See `sis ask -h` for the full CLI surface.

package/templates/agent-plugin/skills/operator-memory/SKILL.md DELETED Viewed

@@ -1,64 +0,0 @@
----
-name: operator-memory
-description: Use right before the operator agent submits its final report. Provides guidance for updating the project-local operator memory at .sisyphus/agent-plugin/skills/operator/ — what to capture, where to put it (SKILL.md vs a new reference file), naming conventions, and what to skip. Defers to /authoring:skills for generic skill conventions (frontmatter, length budgets, structure).
-user-invocable: false
----
-# Updating operator memory
-You're about to submit. Spend a minute capturing what the next operator should not have to rediscover.
-The memory lives at `.sisyphus/agent-plugin/skills/operator/`:
-- `SKILL.md` — the high-level map of this app's surfaces and operations
-- per-task-family reference files alongside it (`auth.md`, `db-reset.md`, `checkout-flow.md`, etc.)
-## When to update (and when NOT to)
-The bar is **"will future operators benefit from this?"** Specifics:
-UPDATE when you discovered:
-- A repeatable operational procedure (login flow, db reset, seed step, environment toggle)
-- A surface that wasn't obvious (admin route, debug overlay, hidden flag, internal port)
-- A footgun you hit and worked around (race condition, ordering requirement, stale-cache trap)
-- A convention this app uses that differs from defaults (custom auth headers, non-standard ports, weird redirect chains)
-DON'T update when:
-- It's session-specific state (this user's email, this session's seeded data)
-- It's a one-off observation that won't reproduce
-- It's already covered (read existing files first — duplication is worse than nothing)
-- It's about the codebase, not about operating the app — that's the orchestrator's domain, not yours
-## SKILL.md vs a reference file
-**SKILL.md** is the high-level map. It answers "what surfaces does this app have, what are the most common operations, where do I find deep dives?" Keep it dense — under ~80 lines. Each entry is a line or two with a pointer.
-**A reference file** is the deep dive for one task family. It answers "exactly how do I do X step by step in this project". Each file has scope: `auth.md`, `db-reset.md`, `checkout-flow.md`, `feature-flags.md`.
-Decision rule:
-- New task family the operator might face → new reference file (and add a one-line entry to SKILL.md's Reference Files section).
-- Refinement to existing knowledge → update the existing reference file or SKILL.md.
-- A surface name you keep referencing → add it to SKILL.md's App Surfaces section once.
-## Naming conventions
-- Reference files: kebab-case, task-family scope, no `operator-` prefix (the directory already implies it), `.md` extension.
-- Good: `auth.md`, `admin-panel.md`, `db-reset.md`, `feature-flags.md`.
-- Bad: `operator-auth.md`, `flows.md`, `notes.md`, `stuff.md`.
-- One file per task family. If `auth.md` exists, append to it; don't create `auth-new.md` or `auth-2.md`.
-## How to update
-1. **Read first.** Open the current `SKILL.md` and any reference file you'll touch — orient before writing. Avoid duplicating what's already there.
-2. **Write/edit with the Write or Edit tool.** The directory already exists at `.sisyphus/agent-plugin/skills/operator/` (the hook scaffolds it on first run).
-3. **Keep prose dense.** The next operator pays in tokens for everything you write. If a step is obvious, omit it.
-4. **Register new reference files** by adding a one-line entry to `SKILL.md`'s "Reference files" section so they're discoverable.
-For frontmatter, length budgets, and general skill structure rules, invoke `/authoring:skills`. Don't reinvent those rules here — this skill only covers operator-specific guidance.
-## Examples
-**Discovered magic-link auth flow:** Create `auth.md` with the steps (email submit → check inbox → click link → cookie set). Add a one-liner to `SKILL.md` App Surfaces (`/login` — magic-link, see `auth.md`). Add to Common Operational Patterns (`Log in: see auth.md`).
-**Hit a stale-cache footgun:** The `/dashboard` route serves stale data for ~30s after a write because of an SWR cache. Add a single bullet to `SKILL.md` Known Footguns: `Dashboard SWR cache holds stale data ~30s after writes — hard refresh or wait`. No new reference file needed — it's a one-liner.
-**Found admin overlay:** `?admin=1` query param toggles an admin panel with seed/reset buttons. Add to `SKILL.md` App Surfaces: `Admin overlay: append ?admin=1 to any page; has seed/reset/feature-flag buttons`. If the overlay is rich enough to need step-by-step coverage, create `admin-panel.md` and link from there.